Sort
Profile photo for Reza Borhani

We actually do use other function approximators: in fact polynomials were the first provable universal approximators, this having been shown in 1885 via the so-called Stone–Weierstrass approximation theorem.

The Fourier basis (and its discrete derivatives) is another extremely popular function approximation tool, used particularly in physics, signal processing, and engineering fields.

These function approximators work fine, especially in low input-dimensions. As the input-dimension [math]N[/math] increases however, so does the number of basis elements in a polynomial or Fourier basis, and it does so combinat

We actually do use other function approximators: in fact polynomials were the first provable universal approximators, this having been shown in 1885 via the so-called Stone–Weierstrass approximation theorem.

The Fourier basis (and its discrete derivatives) is another extremely popular function approximation tool, used particularly in physics, signal processing, and engineering fields.

These function approximators work fine, especially in low input-dimensions. As the input-dimension [math]N[/math] increases however, so does the number of basis elements in a polynomial or Fourier basis, and it does so combinatorially fast. For example, the number of polynomial terms [math]M[/math] in a degree [math]D[/math] polynomial is given by

[math]M=\left(\begin{array}{c}N+D\\D\end{array}\right)[/math]

Fortunately, there’s a remedy here called the kernel trick. This is how, for example, SVMs work. Now there are serious scaling problems with kernel methods but I think I’m starting to get off topic here.

Here’s a simple experiment showing how one can use polynomial, Fourier, or neural network bases for function approximation.

Profile photo for Mona Jalal
  1. Regression tree
  2. Kernel regressors
  3. SVM
  4. Gaussian Process/Gaussian function mixture
  5. Random forest
  6. Kernel density estimator
  7. Radial basis function network

In in case you are interested, here is a list of non-ML approaches:

  1. Polynomial fitting
  2. Taylor series
  3. Fourier series
  4. Wavelets
  5. Lagrange interpolation
  6. Chebyshev’s series

Where do I start?

I’m a huge financial nerd, and have spent an embarrassing amount of time talking to people about their money habits.

Here are the biggest mistakes people are making and how to fix them:

Not having a separate high interest savings account

Having a separate account allows you to see the results of all your hard work and keep your money separate so you're less tempted to spend it.

Plus with rates above 5.00%, the interest you can earn compared to most banks really adds up.

Here is a list of the top savings accounts available today. Deposit $5 before moving on because this is one of th

Where do I start?

I’m a huge financial nerd, and have spent an embarrassing amount of time talking to people about their money habits.

Here are the biggest mistakes people are making and how to fix them:

Not having a separate high interest savings account

Having a separate account allows you to see the results of all your hard work and keep your money separate so you're less tempted to spend it.

Plus with rates above 5.00%, the interest you can earn compared to most banks really adds up.

Here is a list of the top savings accounts available today. Deposit $5 before moving on because this is one of the biggest mistakes and easiest ones to fix.

Overpaying on car insurance

You’ve heard it a million times before, but the average American family still overspends by $417/year on car insurance.

If you’ve been with the same insurer for years, chances are you are one of them.

Pull up Coverage.com, a free site that will compare prices for you, answer the questions on the page, and it will show you how much you could be saving.

That’s it. You’ll likely be saving a bunch of money. Here’s a link to give it a try.

Consistently being in debt

If you’ve got $10K+ in debt (credit cards…medical bills…anything really) you could use a debt relief program and potentially reduce by over 20%.

Here’s how to see if you qualify:

Head over to this Debt Relief comparison website here, then simply answer the questions to see if you qualify.

It’s as simple as that. You’ll likely end up paying less than you owed before and you could be debt free in as little as 2 years.

Missing out on free money to invest

It’s no secret that millionaires love investing, but for the rest of us, it can seem out of reach.

Times have changed. There are a number of investing platforms that will give you a bonus to open an account and get started. All you have to do is open the account and invest at least $25, and you could get up to $1000 in bonus.

Pretty sweet deal right? Here is a link to some of the best options.

Having bad credit

A low credit score can come back to bite you in so many ways in the future.

From that next rental application to getting approved for any type of loan or credit card, if you have a bad history with credit, the good news is you can fix it.

Head over to BankRate.com and answer a few questions to see if you qualify. It only takes a few minutes and could save you from a major upset down the line.

How to get started

Hope this helps! Here are the links to get started:

Have a separate savings account
Stop overpaying for car insurance
Finally get out of debt
Start investing with a free bonus
Fix your credit

Profile photo for Assistant
Assistant

In machine learning, the term "universal function approximator" refers to a model that can approximate any continuous function to a desired level of accuracy, given sufficient capacity and appropriate training. While neural networks are the most commonly cited universal function approximators, there are several other methods and frameworks that can also achieve this property:

  1. Polynomial Functions: Polynomial regression can approximate continuous functions by using polynomials of sufficient degree. However, it may struggle with complex or high-dimensional data.
  2. Support Vector Machines (SVMs): Wit

In machine learning, the term "universal function approximator" refers to a model that can approximate any continuous function to a desired level of accuracy, given sufficient capacity and appropriate training. While neural networks are the most commonly cited universal function approximators, there are several other methods and frameworks that can also achieve this property:

  1. Polynomial Functions: Polynomial regression can approximate continuous functions by using polynomials of sufficient degree. However, it may struggle with complex or high-dimensional data.
  2. Support Vector Machines (SVMs): With the use of kernel functions (like the radial basis function), SVMs can approximate complex decision boundaries and thus serve as universal approximators.
  3. Gaussian Processes: These are non-parametric models that can approximate functions by defining a distribution over functions. They can model complex relationships and provide uncertainty estimates.
  4. Radial Basis Function (RBF) Networks: These networks use radial basis functions as activation functions and can approximate any continuous function with a sufficient number of basis functions.
  5. Decision Trees and Ensembles: Single decision trees can approximate functions, but they may not generalize well. However, ensembles of decision trees (like Random Forests and Gradient Boosting Machines) can effectively approximate complex functions.
  6. k-Nearest Neighbors (k-NN): This algorithm can approximate functions by averaging the outputs of the k closest training examples, effectively capturing local patterns in the data.
  7. Fourier Series: Functions can be approximated using Fourier series, which represent functions as sums of sine and cosine terms. This is particularly useful for periodic functions.
  8. Wavelet Transforms: Wavelets can be used to represent functions at different scales and can approximate a wide range of functions, especially those with localized features.
  9. Linear Combinations of Basis Functions: Any linear combination of a set of basis functions (e.g., splines, Fourier bases) can approximate continuous functions, provided the basis set is sufficiently rich.
  10. Linear Regressions with Interaction Terms: By including interaction terms and polynomial features, linear regression can approximate more complex functions.

Each of these methods has its own strengths and weaknesses, and the choice of method often depends on the specific characteristics of the data and the problem being addressed.

Profile photo for Quora User

This might not be a satisfactory answer, but neural networks are in essence a summation of sigmoid-like functions which are stacked on top of each other.

In some sense, the Fourier series is a universal function approximator… especially if your universe is the set of periodic functions..

And in another sense, as long as you zoom in far enough to some local space, most learning algorithms can approximate a function within that region of the function..

Take it how you will! Hope it was helpful..

Profile photo for Kevin Cameron

This is a question that I asked many years ago because I was interested in building a SPICE simulator that could reduce the transistor equations for a block to a single equation for the block (which eliminates awkward internal nodes). Unfortunately the only technique I’ve seen so far uses neural network models.

An alternative technique for SPICE is to use table-driven models — precalculate answers for particular points and interpolate.

Which techniques work will depend heavily on the data sets and algorithms used. You might be able to use genetic algorithms to guess at a short form function and

This is a question that I asked many years ago because I was interested in building a SPICE simulator that could reduce the transistor equations for a block to a single equation for the block (which eliminates awkward internal nodes). Unfortunately the only technique I’ve seen so far uses neural network models.

An alternative technique for SPICE is to use table-driven models — precalculate answers for particular points and interpolate.

Which techniques work will depend heavily on the data sets and algorithms used. You might be able to use genetic algorithms to guess at a short form function and then use SAT solvers to validate it.

Profile photo for Ravi Singh

Large number of IT working professionals 💼 in the software field are transitioning to Data Science roles. This is one of the biggest tech shifts happening in IT since last 20 Years. If you’re a working professional reading this post, you’ve likely witnessed this shift in your current company also. So Multiple Data science Courses are available online gain expertise in Data Science.

Logicmojo is an Best online platform out of them that offers live Data Science and AI certification courses for working professionals who wish to upskill 🚀 their careers or transition into a Data Scientist role. Th

Large number of IT working professionals 💼 in the software field are transitioning to Data Science roles. This is one of the biggest tech shifts happening in IT since last 20 Years. If you’re a working professional reading this post, you’ve likely witnessed this shift in your current company also. So Multiple Data science Courses are available online gain expertise in Data Science.

Logicmojo is an Best online platform out of them that offers live Data Science and AI certification courses for working professionals who wish to upskill 🚀 their careers or transition into a Data Scientist role. They focus on these two key 🤹‍♀️🤹‍♀️ aspects:

✅ Teaching candidates advanced Data Science and ML/AI concepts, followed by real-time projects. These projects add significant value to your resume.

✅ Assisting candidates in securing job placements through their job assistance program for Data Scientist or ML Engineer roles in product companies.

Once you have a solid portfolio of Data Science projects on your resume 📝 , you’ll get interview calls for Data Scientist or ML Engineer roles in product companies.

So, to secure a job in IT companies with a competitive salary 💰💸 , it’s crucial for software developers, SDEs, architects, and technical leads to include Data Science and Machine Learning skills in their skill-set 🍀✨. Those who align their skills with the current market will thrive in IT for the long term with better pay packages.

Recently in last few years, software engineer roles have decreased 📉 by 70% in the market, and many MAANG companies are laying off employees because they are now incorporating Data Science and AI into their projects. On the other hand, roles for Data Scientists, ML Engineers, and AI Engineers have increased 📈 by 85% in recent years, and this growth is expected to continue exponentially.

Self-paced preparation 👩🏻‍💻 for Data Science might take many years⌛, as learning all the new tech stacks from scratch requires a lot of time. Just Learning technical knowledge is not enough 🙄, you also need to have project experience in some live projects that you can showcase in your resume 📄. Based on these project experience only you will be shortlisted to Data Scientist roles. So,If you want a structured way of learning Data Science and Machine Learning/AI, it’s important to follow a curriculum that includes multiple projects across different domains.

Logicmojo's Data Science Live Classes offer 12+ real-time projects and 2+ capstone projects. These weekend live classes are designed for working professionals who want to transition from the software field to the Data Science domain 🚀. It is a 7-month live curriculum tailored for professionals, covering end-to-end Data Science topics with practical project implementation. After the course, the Logicmojo team provides mock interviews, resume preparation, and job assistance for product companies seeking Data Scientists and ML Engineers.

So, whether you are looking to switch your current job to a Data Scientist role or start a new career in Data Science, Logicmojo offers live interactive classes with placement assistance. You can also 👉 contact them for a detailed discussion with a senior Data Scientist with over 12+ years of experience. Based on your experience, they can guide you better over a call.

Remember, you need to upgrade 🚀 your tech skills to match the market trends; the market won’t change to accommodate your existing skills.

Profile photo for Joseph Bills

There are plenty. There are even ones more powerful that neural networks, like an algorithm that just tests every computable function sequentially until it finds one that generates the given data set, they are just far too inefficient to be used in practice. One that faster than neural networks though is polynomial regression.

Profile photo for Jerry Liu

I was actually curious about this myself, so decided to implement a small script in Tensorflow to check this. I’m assuming that we’re trying to learn the function [math]f(x,y) = min(x,y)[/math], which according to WolframAlpha looks something like this:

Neural networks can represent any arbitrary function up to some epsilon, but they’re generally not designed to exactly replicate the function, so I tried to at least come up with an adequate approximation.

My simple network had two hidden layers with 3 units and 2 units respectively, with ReLU activations. The output was a simple linear combination of the 2 l

I was actually curious about this myself, so decided to implement a small script in Tensorflow to check this. I’m assuming that we’re trying to learn the function [math]f(x,y) = min(x,y)[/math], which according to WolframAlpha looks something like this:

Neural networks can represent any arbitrary function up to some epsilon, but they’re generally not designed to exactly replicate the function, so I tried to at least come up with an adequate approximation.

My simple network had two hidden layers with 3 units and 2 units respectively, with ReLU activations. The output was a simple linear combination of the 2 last hidden units. I trained with Adam with a learning rate of 0.001, for 20000 steps. I tried regular gradient descent at first and found that the training collapsed where the network’s outputs were independent of the input.

If I constrained the data to a non-negative interval, [math][0,x][/math], I found that training was stable and convergence was relatively quick. For instance, setting x=100,000, here’s a sample of the predictions:

  • Data, prediction:
    • (29239, 76479), 29238.9369404545
    • (1931, 64039), 1930.9829420528908
    • (67368, 88702), 67367.895374989

However, if the interval included negative numbers (e.g. [math][-x,x][/math]), the network became much harder to train in a stable fashion. Setting x=10,000, in some runs I get:

  • (-8099, 6195), -8095.1940113855135
  • (6559, -3805), -3807.9644528298445

while in other runs I get:

  • (-1053, 3159), -1159.525077130749
  • (5897, 9739), 0.6422036529396257

and in some runs I run into the same output collapse issue as I initially had with gradient descent. I think the reason for this is that when negatives are included, the function essentially has to learn 3 different things: to pick the bigger negative value when both inputs are negative, to pick the negative value when one is negative, and to pick the smaller positive value when both are positive.

In conclusion neural nets can learn the min function easily if either constrained within the positive or negative interval, and less easily if the interval includes both. Hope this helps!

Profile photo for Grammarly

Communicating fluently in English is a gradual process, one that takes a lot of practice and time to hone. In the meantime, the learning process can feel daunting: You want to get your meaning across correctly and smoothly, but putting your ideas into writing comes with the pressure of their feeling more permanent. This is why consistent, tailored suggestions are most helpful for improving your English writing abilities. Seeing specific writing suggestions based on common grammatical mistakes multilingual speakers make in English is key to improving your communication and English writing fluen

Communicating fluently in English is a gradual process, one that takes a lot of practice and time to hone. In the meantime, the learning process can feel daunting: You want to get your meaning across correctly and smoothly, but putting your ideas into writing comes with the pressure of their feeling more permanent. This is why consistent, tailored suggestions are most helpful for improving your English writing abilities. Seeing specific writing suggestions based on common grammatical mistakes multilingual speakers make in English is key to improving your communication and English writing fluency.

Regular feedback is powerful because writing in a language that isn’t the first one you learned poses extra challenges. It can feel extra frustrating when your ideas don’t come across as naturally as in your primary language. It’s also tough to put your writing out there when you’re not quite sure if your grammar and wording are correct. For those communicating in English in a professional setting, your ability to write effectively can make all the difference between collaboration and isolation, career progress and stagnation.

Grammarly Pro helps multilingual speakers sound their best in English with tailored suggestions to improve grammar and idiomatic phrasing. Especially when you’re writing for work, where time often is in short supply, you want your communication to be effortless. In addition to offering general fluency assistance, Grammarly Pro now includes tailored suggestions for writing issues common among Spanish, Hindi, Mandarin, French, and German speakers, with more languages on the way.

Features for all multilingual speakers

Grammarly’s writing suggestions will catch the most common grammatical errors that multilingual speakers make in English. For example, if you drop an article or misuse a preposition (such as “on” instead of “in”), our sidebar will flag those mistakes within the Fix spelling and grammar category with the label Common issue for multilingual speakers. Most importantly, it will provide suggestions for fixing them. While these errors seem small, one right after another can make sentences awkward and more difficult to absorb. Eliminating them all in one fell swoop is a powerful way to put a more fluent spin on your document.

Features for speakers of specific languages

With Grammarly Pro, speakers of French, German, Hindi, Mandarin, and Spanish can get suggestions specifically tailored to their primary language, unlocking a whole other level of preciseness in written English. For speakers of those languages, our sidebar will flag “false friends,” or cognates, which are words or phrases that have a similar form or sound in one’s primary language but don’t have the same meaning in English.

But now Grammarly Pro’s writing suggestions will catch these types of errors for you and provide suggestions on how to fix them. You can find these suggestions in the Sound more fluent category in our floating sidebar. Simply click on the suggestion highlighted in green, and voila, your English will be more polished and accurate.

PS: Tailored suggestions for other language backgrounds are on the way!

Profile photo for Ross Kravitz

Consider the set of all continuous functions which are defined on the unit hypercube (i.e. the unit square in two dimensions, the unit cube in three dimensions, etc.). Call this set C.

Given two functions in C, it is possible to define a metric which calculates a notion of distance between them.

For example, if you have [math] f(x), g(x):[0,1] \rightarrow \mathbb{R} [/math], you could define the sup (short for supremum distance): [math] d(f,g) := \sup_{y \in [0,1]} |f(y) - g(y)| [/math]. This calculates the maximum vertical spread between the graphs of the two functions f and g.

Consider the set of functions of the fo

Consider the set of all continuous functions which are defined on the unit hypercube (i.e. the unit square in two dimensions, the unit cube in three dimensions, etc.). Call this set C.

Given two functions in C, it is possible to define a metric which calculates a notion of distance between them.

For example, if you have [math] f(x), g(x):[0,1] \rightarrow \mathbb{R} [/math], you could define the sup (short for supremum distance): [math] d(f,g) := \sup_{y \in [0,1]} |f(y) - g(y)| [/math]. This calculates the maximum vertical spread between the graphs of the two functions f and g.

Consider the set of functions of the form [math]F(x) = \sum_{i=1}^N a_i \sigma(y_i^T x + \theta_i) [/math]. As you vary the parameters in this expression, [math] N, a_i, y_i, \theta_i [/math], you sweep over all the functions which can be the output of a 2 layer neural network. Call this set of functions NN.

Stepping back, suppose you had a subset of functions D, that were very good at approximating all the other functions in C. You might formalize this idea by saying that for any element f of C, you can find an element g in D which is very close to f. Mathematically, for any [math] \epsilon > 0 [/math], and any [math] f \in C [/math], you can find a [math] g \in D [/math], such that [math] d(f, g) < \epsilon [/math]. The choice of g will depend on the particular f and epsilon. Sets with this property are called dense. Dense is a good choice since the subset D seems to fill up all the space in C, like air filling up a balloon.

So, the theorem on neural nets being universal approximators. Go back to the set NN. One can prove that NN is dense in C. Given an arbitrary function, you can find a neural net output function that is arbitrarily close to the function at all input values.

Finally, it should be said that this property of neural nets is not something mystical about them. Many classes can have this property. In particular, if you replace the sigmoid function [math] \sigma [/math] with any other bounded, increasing function, the density result is still true. On some level, it is not that hard for a subset of functions to be dense inside of C.

Profile photo for Luis Argerich

In an intuitive sense that if you have a function in the form of a list of inputs and outputs there is a Neural Network that given those inputs will approximate the outputs very well.

This can be any function from "n" to "m" dimensions.
A quick example would be to approximate sin(x)*cos(y). In this case your input has 2 dimensions and your output is a real number.

If you train the NN with several values of x & y and the expected output the NN will learn how to predict the result of the function without even knowing what function it was.

Here's a graphical example:

y=0.2 + 0.4x^2+0.3*sin(15x) + 0.0

In an intuitive sense that if you have a function in the form of a list of inputs and outputs there is a Neural Network that given those inputs will approximate the outputs very well.

This can be any function from "n" to "m" dimensions.
A quick example would be to approximate sin(x)*cos(y). In this case your input has 2 dimensions and your output is a real number.

If you train the NN with several values of x & y and the expected output the NN will learn how to predict the result of the function without even knowing what function it was.

Here's a graphical example:

y=0.2 + 0.4x^2+0.3*sin(15x) + 0.05*cos(50x)

And how a NN will approximate this function using different number of hidden units:

Luis.

I used to think pet insurance was unnecessary (a luxury, not a necessity). That changed after my friend’s dog Bear got sick out of nowhere. What started as minor symptoms turned into an emergency vet visit, followed by a cancer diagnosis, and $20,000 in medical expenses. In that moment, I realized how quickly things can spiral when it comes to a pet’s health.

Fortunately, my friend found a pet insurance policy from this website so Bear got the treatment he needed without my friend having to make impossible financial decisions.

If you’re wondering whether pet insurance is worth it, here are a few

I used to think pet insurance was unnecessary (a luxury, not a necessity). That changed after my friend’s dog Bear got sick out of nowhere. What started as minor symptoms turned into an emergency vet visit, followed by a cancer diagnosis, and $20,000 in medical expenses. In that moment, I realized how quickly things can spiral when it comes to a pet’s health.

Fortunately, my friend found a pet insurance policy from this website so Bear got the treatment he needed without my friend having to make impossible financial decisions.

If you’re wondering whether pet insurance is worth it, here are a few lessons I took away from Bear’s experience:

1. Pet insurance lets you focus on care—not costs

When Bear was diagnosed, my friend didn’t have to weigh his bank account against Bear’s well-being. Pet insurance covered the bulk of the costs, making it possible to move forward with aggressive treatment options right away. It’s peace of mind when you need it most.

Look here to see pet insurance options that cover both emergencies and serious conditions like cancer.

2. It helps with more than just major illnesses

While Bear’s case was extreme, many plans also cover routine care like annual checkups, vaccinations, and preventative treatments. These smaller costs add up, and having insurance means less strain on your wallet over time.

Explore policies with coverage for routine care here.

3. Vet bills can escalate quickly—even for small issues

Before Bear’s diagnosis, the initial tests and scans alone cost thousands. It was a reminder of how even something that seems minor can rack up a big bill fast. Pet insurance ensures you’re not caught off guard when costs pile up.

4. Insurance gives you flexibility and peace of mind

Without insurance, my friend would have faced tough decisions about Bear’s treatment—choices no pet owner should have to make. With a good policy, you can focus on what’s best for your pet instead of stressing over finances.

5. It’s a smart investment for any pet owner

Whether you’re caring for a young, healthy pup or an aging senior pet, insurance can be tailored to your pet’s specific needs. It’s not just about saving money—it’s about being ready for whatever life throws your way.

So, is pet insurance a good idea? Based on what I’ve seen, absolutely. It’s not just a financial safety net; it’s a way to ensure your pet gets the best possible care, no matter the circumstances.

If you’re thinking about it, take a few minutes to explore your options. This tool makes it easy to compare plans and find the right coverage for your furry friend. It could be one of the smartest decisions you make for your pet—and your peace of mind.

Profile photo for Asraful Sharker Nirob

Neural networks, particularly deep feedforward networks, are universal approximators, meaning they can approximate any continuous function under certain conditions. However, there are some functions that they cannot approximate well:

  1. Discontinuous Functions – Neural networks struggle with functions that have sharp discontinuities, such as: The step function (e.g., Heaviside function). Functions with sudden jumps in values.
  2. Highly Oscillatory Functions – Functions with extreme variations over small intervals, like: The Dirichlet function, which is 1 for rational numbers and 0 for irrationals. The

Neural networks, particularly deep feedforward networks, are universal approximators, meaning they can approximate any continuous function under certain conditions. However, there are some functions that they cannot approximate well:

  1. Discontinuous Functions – Neural networks struggle with functions that have sharp discontinuities, such as: The step function (e.g., Heaviside function). Functions with sudden jumps in values.
  2. Highly Oscillatory Functions – Functions with extreme variations over small intervals, like: The Dirichlet function, which is 1 for rational numbers and 0 for irrationals. The sign function, which abruptly changes at zero.
  3. Non-Measurable Functions – Functions that are not Lebesgue measurable, like some constructs from set theory, cannot be represented by neural networks.
  4. Computationally Undecidable Functions – Problems related to the halting problem or non-Turing-computable functions.
  5. Sparse and High-Frequency Functions – Some functions with very high-frequency details or fractal-like structures may require an impractically large network to approximate.

In practice, neural networks can approximate most real-world functions if given enough layers and training data, but they still face challenges with sharp discontinuities, infinite complexity, or logical non-computability.

Profile photo for Christine

Yes, there is evidence that recurrent neural networks (RNNs) are universal approximators, but with some nuances. Here's a breakdown:

  • Universal Approximation Theorem: This theorem states that a sufficiently large feed-forward neural network with one hidden layer using specific activation functions can approximate any continuous function to an arbitrary degree of accuracy.
  • RNNs and Universal Approximation: While RNNs differ from standard feed-forward networks due to their recurrent connections, research suggests that they too can be considered universal approximators under certain conditions.
  • Key P

Yes, there is evidence that recurrent neural networks (RNNs) are universal approximators, but with some nuances. Here's a breakdown:

  • Universal Approximation Theorem: This theorem states that a sufficiently large feed-forward neural network with one hidden layer using specific activation functions can approximate any continuous function to an arbitrary degree of accuracy.
  • RNNs and Universal Approximation: While RNNs differ from standard feed-forward networks due to their recurrent connections, research suggests that they too can be considered universal approximators under certain conditions.
  • Key Papers: The paper "Recurrent neural networks are universal approximators" by Hornik, Stinchcombe, and White (1989) is a foundational work in this area. It proves that RNNs in state-space form with a single hidden layer and a specific activation function can approximate any continuous function on a compact set.
  • Nuances and Considerations:The proof relies on specific conditions, such as the network having a finite number of states and specific activation functions.The practicalities of achieving universal approximation with RNNs can be more challenging than with feed-forward networks due to issues like vanishing gradients and the complexity of training recurrent architectures.The theorem focuses on approximating continuous functions. RNNs might not be ideal for approximating all types of functions (e.g., discontinuous functions).

In conclusion, there is theoretical evidence supported by research that RNNs can be considered universal approximators under certain conditions. However, the practicalities of achieving this and the types of functions they can approximate effectively require further consideration.

Here are some additional points to keep in mind:

  • The field of neural networks is constantly evolving, and new research might provide further insights into the universal approximation capabilities of RNNs and other neural network architectures.
  • Even if a network is theoretically capable of universal approximation, training it effectively to achieve that level of accuracy can be computationally expensive and might not always be necessary for practical applications.
Profile photo for Niklas Lang

Here are some common alternatives to the sigmoid activation function:

1. ReLU (Rectified Linear Unit)

  • Pros: Fast, simple, and effective. Helps avoid the vanishing gradient problem and speeds up training.
  • Cons: Can suffer from the "dying ReLU" problem where neurons stop learning.

2. Leaky ReLU

  • Pros: Fixes the dying ReLU problem by allowing small gradients for negative inputs.
  • Cons: Requires tuning of the leak parameter.

3. Tanh (Hyperbolic Tangent)

  • Pros: Outputs values between -1 and 1, which helps center data.
  • Cons: Still suffers from the vanishing gradient issue at extreme values.

4. ELU (Exponential L

Here are some common alternatives to the sigmoid activation function:

1. ReLU (Rectified Linear Unit)

  • Pros: Fast, simple, and effective. Helps avoid the vanishing gradient problem and speeds up training.
  • Cons: Can suffer from the "dying ReLU" problem where neurons stop learning.

2. Leaky ReLU

  • Pros: Fixes the dying ReLU problem by allowing small gradients for negative inputs.
  • Cons: Requires tuning of the leak parameter.

3. Tanh (Hyperbolic Tangent)

  • Pros: Outputs values between -1 and 1, which helps center data.
  • Cons: Still suffers from the vanishing gradient issue at extreme values.

4. ELU (Exponential Linear Unit)

  • Pros: Addresses vanishing gradients and produces negative values, which can help performance.
  • Cons: Slightly more complex and requires a new hyperparameter.

5. Swish

  • Pros: A smooth, non-monotonic function that has shown better performance in deeper networks.
  • Cons: More computationally expensive than ReLU.

6. Softmax (for multi-class problems)

  • Pros: Great for classification tasks with multiple classes, outputs probabilities for each class.
  • Cons: Not suited for binary classification or regression.
Profile photo for Robert A. Levinson, Phd

Looking to explore beyond the conventional sigmoid activation functions for your deep learning models? You're in luck! The world of neural networks offers a plethora of alternatives that can take your models to new heights. Let's dive into some compelling alternatives that can revolutionize your deep learning journey.

1. ReLU (Rectified Linear Unit): This superstar activation function has gained immense popularity due to its simplicity and effectiveness. ReLU sets all negative values to zero, allowing positive values to pass through. Its ability to avoid the vanishing gradient problem and accel

Looking to explore beyond the conventional sigmoid activation functions for your deep learning models? You're in luck! The world of neural networks offers a plethora of alternatives that can take your models to new heights. Let's dive into some compelling alternatives that can revolutionize your deep learning journey.

1. ReLU (Rectified Linear Unit): This superstar activation function has gained immense popularity due to its simplicity and effectiveness. ReLU sets all negative values to zero, allowing positive values to pass through. Its ability to avoid the vanishing gradient problem and accelerate training speed makes it a top choice for many deep learning practitioners.

2. Leaky ReLU: Taking ReLU a step further, Leaky ReLU introduces a small slope for negative values instead of setting them to zero. This slight modification helps overcome the dying ReLU problem, where some neurons become inactive. By introducing a small negative gradient, Leaky ReLU ensures information flow even for negative inputs.

3. ELU (Exponential Linear Unit): ELU takes inspiration from Leaky ReLU but goes a step ahead by using an exponential function for negative values. This leads to smoother learning and better generalization. ELU also handles the dying ReLU problem and provides a more robust activation function for deep neural networks.

4. SELU (Scaled Exponential Linear Unit): SELU takes the benefits of ELU and adds self-normalization capabilities. It ensures that the mean and variance of the inputs remain stable throughout the layers, promoting stable learning and reducing the need for extensive hyperparameter tuning.

5. Tanh (Hyperbolic Tangent): An alternative to sigmoid, Tanh squeezes the input values between -1 and 1. It provides a symmetric activation function, making it suitable for models that require negative values. Although it still suffers from the vanishing gradient problem, it can be a valuable alternative depending on the specific requirements of your deep learning task.

6. Softplus: Softplus is a smooth and differentiable activation function that offers a more gradual transition than ReLU. It has been found to perform well in certain scenarios, especially for models where the output needs to be positive.

These alternatives to sigmoid activation functions open up a world of possibilities for your deep learning models. Each comes with its unique advantages and may suit different use cases. Experimenting with these options can unlock new avenues for enhanced performance, improved training speed, and better generalization. So, don't hesitate to explore beyond the sigmoid and discover the perfect activation function for your deep learning masterpiece!

Profile photo for Alfred Dominic Vella

There are many learning algorithms besides neural networks. In fact not all neural networks are actually networks of neurons but because of marketing of ideas to funding bodies there is a tendency to call things whatever the latest fad is.

Machine learning, and neural networks, go back to the 1950s but we have not yet got a settled set of good algorithms.

You can get an overview from

https://www.coursera.org/specializations/machine-learning

A Tour of Machine Learning Algorithms .

Essentials of Machine Learning Algorithms (with Python and R Codes)

I have personally tried many algorithms including n

There are many learning algorithms besides neural networks. In fact not all neural networks are actually networks of neurons but because of marketing of ideas to funding bodies there is a tendency to call things whatever the latest fad is.

Machine learning, and neural networks, go back to the 1950s but we have not yet got a settled set of good algorithms.

You can get an overview from

https://www.coursera.org/specializations/machine-learning

A Tour of Machine Learning Algorithms .

Essentials of Machine Learning Algorithms (with Python and R Codes)

I have personally tried many algorithms including neural networks, genetic algorithms, ant colony optimisation, regression and decision trees.

The first that I tried was based on menace (see below) and is still one of my favourites. I used a computer but you do not need to;)

I have also examined many PhD theses on ML too and each makes an improvement on what came before.

We are, however, still a long way from understanding learning, machine and human.

Menace: the Machine Educable Noughts And Crosses Engine - Chalkdust

Profile photo for Mike West

A artificial neural network is a machine learning model that makes predictions.

These networks are composed of layers of artificial neurons.

Here’s a pic for you.

Step 1: Data is fed into a model. The data above are two separate data points, for example… a 1 and 3. NOTE: All machine learning models are monolingual, they only speak numbers. The data is fed into the first layer called the input layer. This is NOT a computational layer. It’s only there to accept the data.

Step 2: Data flows through the next layer called a hidden layer. This is a computational layer. This neural network has only one h

A artificial neural network is a machine learning model that makes predictions.

These networks are composed of layers of artificial neurons.

Here’s a pic for you.

Step 1: Data is fed into a model. The data above are two separate data points, for example… a 1 and 3. NOTE: All machine learning models are monolingual, they only speak numbers. The data is fed into the first layer called the input layer. This is NOT a computational layer. It’s only there to accept the data.

Step 2: Data flows through the next layer called a hidden layer. This is a computational layer. This neural network has only one hidden layer and is the most basic neural network you can define. NOTE: A deep learning model is a neural network with many hidden layers. While there’s no agreed upon number, I asked Hinton at a conference and he said ten. So, that’s what I go with.

Step 3: The last step is the output from the patterns found in your data. It’s called the output layer and it’s also not a computational layer. It’s the answer you’ve asked the model to find.

Now, there’s a lot more to it obviously but this is a good high-level look.

Ready to learn real-world machine learning? Start here.

Profile photo for Rehan Jutt

In the realm of theoretical computer science, the universal Turing machine is indeed a fundamental concept. It serves as a theoretical model for computation, capable of simulating any other Turing machine and thus embodying the concept of universality in computation. However, there are other models and variations of Turing machines that have been proposed and studied over time. Here are a few:

  1. Probabilistic Turing Machines: These are Turing machines that incorporate randomness into their operations. Instead of deterministic transitions, probabilistic Turing machines have transition probabilitie

In the realm of theoretical computer science, the universal Turing machine is indeed a fundamental concept. It serves as a theoretical model for computation, capable of simulating any other Turing machine and thus embodying the concept of universality in computation. However, there are other models and variations of Turing machines that have been proposed and studied over time. Here are a few:

  1. Probabilistic Turing Machines: These are Turing machines that incorporate randomness into their operations. Instead of deterministic transitions, probabilistic Turing machines have transition probabilities associated with each possible move.
  2. Quantum Turing Machines: Quantum Turing machines extend classical Turing machines to incorporate principles from quantum mechanics. They operate on quantum bits (qubits) and have quantum gates as part of their operations, allowing for potential exponential speedup over classical computation in some cases.
  3. Multi-Tape Turing Machines: In addition to the single-tape Turing machine model proposed by Alan Turing, there are variations with multiple tapes. Multi-tape Turing machines have several tapes, each with its own head, allowing for potentially more efficient computation for certain tasks.
  4. Non-deterministic Turing Machines: These machines can make non-deterministic choices at each step. They can explore all possible choices simultaneously, akin to a tree search, and accept if at least one of the branches leads to an accepting state.
  5. Parallel and Distributed Turing Machines: These models extend the Turing machine concept to parallel and distributed systems. They allow multiple processors to work simultaneously, communicating and coordinating their actions to perform computation.

While the universal Turing machine is a cornerstone of theoretical computer science and computation theory, these variations and extensions offer insights into different aspects of computation. The choice of model often depends on the specific problem being studied or the characteristics of the computing environment being considered. Each model has its own set of strengths and weaknesses, making them suitable for different types of analysis and applications.

Profile photo for Dr. S. Pradeep

Recurrent Neural Networks (RNNs) are a class of neural networks that are particularly powerful for processing sequential data, and there is theoretical evidence to suggest that they can act as universal approximators. This concept stems from their ability to model any dynamic system given a sufficient number of hidden units and proper weights. Siegelmann and Sontag (1995) provided a seminal contribution to this field by showing that a certain class of RNNs with sigmoid activation functions can simulate Turing machines, thus demonstrating their capability to approximate any computable function

Recurrent Neural Networks (RNNs) are a class of neural networks that are particularly powerful for processing sequential data, and there is theoretical evidence to suggest that they can act as universal approximators. This concept stems from their ability to model any dynamic system given a sufficient number of hidden units and proper weights. Siegelmann and Sontag (1995) provided a seminal contribution to this field by showing that a certain class of RNNs with sigmoid activation functions can simulate Turing machines, thus demonstrating their capability to approximate any computable function to an arbitrary degree of accuracy. This result hinges on the network's recursive nature, which allows it to maintain a form of memory by processing inputs through time, making it theoretically capable of capturing complex dynamics in data sequences. While the universal approximation property of RNNs is theoretically compelling, it's important to note that in practice, the ability to train these networks to realize such potential can be challenging due to issues like vanishing and exploding gradients. Nevertheless, advancements in architecture designs, like Long Short-Term Memory (LSTM) networks, have made significant strides in mitigating these problems, further enhancing the practical applicability of RNNs as universal approximators for sequential data.

Profile photo for Quora User

The book ‘Introduction to Machine Learning’ by Alpaydin has a very good explanation of how RBFs compare with feedforward neural nets (NNs). Read section 12.3.

Summary answer: RBFs train faster than NNs but are actually a more inefficient model and are impractical for huge number of input dimensions.

Detailed explanation below.

Advantage of NNs over RBFs:

  • RBFs are local representation methods while NNs are distributed representation methods. Local representation methods are inefficient representations because they require roughly one hidden node per input region while in the latter a smaller number

The book ‘Introduction to Machine Learning’ by Alpaydin has a very good explanation of how RBFs compare with feedforward neural nets (NNs). Read section 12.3.

Summary answer: RBFs train faster than NNs but are actually a more inefficient model and are impractical for huge number of input dimensions.

Detailed explanation below.

Advantage of NNs over RBFs:

  • RBFs are local representation methods while NNs are distributed representation methods. Local representation methods are inefficient representations because they require roughly one hidden node per input region while in the latter a smaller number of hidden nodes are sufficient to jointly compactly represent the entire input space in a distributed manner or what is called ‘black box’ approach.
  • Thus when the input space is huge like due to huge number of dimensions, it is better to use NNs and RBFs become impractical. This is the curse of dimensionality.

However: Disadvantage of NNs compared to RBFs:

  • RBFs train faster because each hidden node’s output weight quickly converges by winning over the other nodes’s output by presenting the input values corresponding to that node. On the other hand, in training NNs when you present the network with inputs of a particular region, *all* the hidden layer weights get updated and that too only gradually. This prolongs the weight convergence when you present inputs from all regions of the input space in whatever order, for training.
  • Due to being a distributed representation, NNs are a black box method i.e. by looking at the network weights you cannot intuitively understand how each region of input gets transformed to the output. This is because that transformation is ‘spread’ out over all the hidden weights in a ‘mysterious’ manner that is difficult to fathom intuitively. Thus knowledge/rule extraction from NNs is very difficult. Another way of putting this is that the discriminative features that NNs extract from the data are mysterious and difficult to fathom. On the other hand, the RBFs are relatively far more white box: the RBF centers, spreads and output weights of each hidden node will directly tell you which part of the input region is handled by each hidden node and how it transforms to the output value. The discriminative features figured out by RBFs are nothing but the means and spreads of the hidden nodes. Thus knowledge/rule extraction is far easier. Within the RBF class of algorithms, the competitive methods may be even easier to understand than the cooperative methods because they are explicitly designed to make one hidden node win out over the others.
Your response is private
Was this worth your time?
This helps us sort answers on the page.
Absolutely not
Definitely yes
Profile photo for E. Z.

Yes, a neural network, is just that, a mathematical function. The simplest case, is a single input node, a weight, and an output node.

Now, when you add multiple layers, a neural network is still just a function, it’s just that it becomes a composition of functions as the signal passes from layer to layer. The weights are just parameters in the function that are fine tuned to get lower loss (typica

Yes, a neural network, is just that, a mathematical function. The simplest case, is a single input node, a weight, and an output node.

Now, when you add multiple layers, a neural network is still just a function, it’s just that it becomes a composition of functions as the signal passes from layer to layer. The weights are just parameters in the function that are fine tuned to get lower loss (typically through gradient descent).

In the earlier days of ML, we had only linear classifiers such as the linear perceptron and linear SVM. These could only classify problems when the data was linearly separable. This just meant that if you plotted the inputs and outputs as data points on a grid, you could separate two classes with a line. Here is a visual:

The left image was one where a perceptron and a linear SVM could classify the data accurately, where the right image, they could not.

Our “AI” could only learn lines, and this was just not cutting it for many real world problems.

In 1989, George Cybenko, showed that a neural network with a single hidden layer could arbitrarily approximate a continuous function on a compact subset of R^n.

Almost concurrently, non-linear SVMs were developed, so they too, could be used on non-linearly separable data.

This doesn’t mean a neural network is always better than a non-linear SVM, and on cases with a smaller data sets, non-linear SVMs can easily outperform a neural network.

What Cybenko’s result meant, was that a shallow network (and a non-linear SVM) had the capability of learning any decision boundary (or decision surface in higher dimensions). So no more just having lines, but arbitrary curves and surfaces.

Here is a visual of a non-linear decision boundary (2-D) splitting plotted data points into two classes.

The issue here was that Cybenko showed that it was possible to ach...

Artificial neural networks are trained by being given a large number of inputs, together with the correct output for each. Kind of like the way you train a dog. You issue a command then show them what you want them to do in response. But that is basically rote recall, and an artificial neural network does something even more difficult, which is learn is a sort of "average", so that you can later give the network an input it hasn't seen before, and it will give an output close to what you expect.

How they generate an output, given an input, is by having a built-in internal model that has a lot

Artificial neural networks are trained by being given a large number of inputs, together with the correct output for each. Kind of like the way you train a dog. You issue a command then show them what you want them to do in response. But that is basically rote recall, and an artificial neural network does something even more difficult, which is learn is a sort of "average", so that you can later give the network an input it hasn't seen before, and it will give an output close to what you expect.

How they generate an output, given an input, is by having a built-in internal model that has a lot of variables in it. If the variables are set to random values, and the model is given an input, then it will generate a random output. What the training does is search for a set of values for the variables that causes the outputs from the model to match to the correct outputs given in the training set.

In fact, training starts by giving random values to all the variables. Then all of the inputs + correct outputs are given to the model. It measures the error between what it generates versus the correct output. Then it changes the values of the variables in the direction that will reduce the sum of the errors. Then it repeats, until changes in the variables no longer reduce the summed up errors (summed over the whole training set of inputs plus correct outputs).

Now, how it decides the direction to change each variable, and the amount, is by knowing something about the model. The model built in to most artificial neural networks makes an input consist of a large number of "digits", and provides one variable for each digit that is multiplied by that digit. Then it adds together the results of all the digit times variable multiplies. That final sum is then passed through a limiter, so there is a maximum output value (both max positive and max negative).

That is how the model takes input digits and produces an output from them. It multiples the digits by the variables then sums the results.

It turns out that with a model of this form, you can always tell whether increasing an individual variable's value will increase the error or decrease it. That is, if you give the model a set of input digits, and measure the error versus the correct output, then you can tell how much each variable contributed to the error.

So, for a given input in the training set, the network makes a recording for each variable. It measures how much that variable contributed to the error generated by the network, for that input. It also records the direction to change the variable to reduce the error for that input. It keeps a running sum of the size of contribution and the direction to change it.

At the end, for each variable, it has a sum of the individual changes that would have reduced that variable's contribution to the errors. Then it just adds that sum to the variable value. To be cautious, though, it only adds a fraction of the sum. This gives a new variable value. Then it repeats the process, until the summed error stays nearly the same two times in a row. (There are variations: some neural networks save up all the corrections and apply the sum at the end, while others, like "back propagation" change the variables after each input)

It's pretty simple, conceptually, and quite amazing that it works as well as it does.

The magic is in being able to look at a single variable and see whether increasing that variable's value will increase the error or decrease it, and by how much. A major breakthrough that started artificial neural networks was in discovering a model that has this property. This form of model is called a "linear combination", and it enables the process described above, which is called "gradient descent". The way to calculate the contribution of a variable to the error is by "taking the derivative". The derivative points in the direction of steepest change of the error, with respect to the variable differentiated upon. (Small correction: in order to make this all work, the square of the error is actually used everywhere).

Recently many variations have been discovered to this basic approach. They all have the same underlying idea of starting with a model that has many variables, and measuring the contribution of each variable to the error, then adjusting the variables and repeating. However, for some variations, this gets hidden deep down inside the math.

Sean

Profile photo for Joe Porter

We need neural networks in machine learning because they outperform every other model in two of the most important and useful applications: natural language processing (including speech recognition) and computer vision. Neural networks require large amounts of high quality labeled data but they are extremely good at extracting features from that data. The hidden layers of a deep neural network can create hierarchical representations of data that enable emergent capabilities. These emergent capabilities are why AI has made such extraordinary progress over the past decade. Most neural networks a

We need neural networks in machine learning because they outperform every other model in two of the most important and useful applications: natural language processing (including speech recognition) and computer vision. Neural networks require large amounts of high quality labeled data but they are extremely good at extracting features from that data. The hidden layers of a deep neural network can create hierarchical representations of data that enable emergent capabilities. These emergent capabilities are why AI has made such extraordinary progress over the past decade. Most neural networks are also easily computed on high performance parallel processors such as GPUs, while other algorithms are impossible to compute efficiently on large datasets. Neural networks also tend to generalize better on new data because of how the hidden layers learn. However, neural networks do not work well on small and noisy (low quality) datasets. Still, an AI built on neural network technology could pick an appropriate machine learning algorithm for any given dataset and then train that model and present the results in a way humans can understand. This is not true for any other machine learning algorithm.

Profile photo for Yassine Alouini

This is a handy-wavy answer to a mathematical notion. You should check the reference for more details.

I guess you are referring to this theorem [1].

The Universal approximation theorem roughly states that “simple” neural networks can approximate “many” functions.

Notice that there are some restrictions on the family of functions you can approximate using a single hidden layer neural network (continu

This is a handy-wavy answer to a mathematical notion. You should check the reference for more details.

I guess you are referring to this theorem [1].

The Universal approximation theorem roughly states that “simple” neural networks can approximate “many” functions.

Notice that there are some restrictions on the family of functions you can approximate using a single hidden layer neural network (continuous functions [ https://en.wikipedia.org/wiki/Continuous_functions ] on compact subsets [ https://en....

Profile photo for Alberto Bietti

Here’s a simple neural network function based on ReLUs, to go along with the nice empirics in Jerry Liu‘s answer. I’ll consider [math]\max(x,y)[/math] to make things simpler, but the same can be done for [math]\min[/math] by switching some signs. I’ll try to make some (speculative) comments on what could make it easier or harder to learn with a gradient method.

For positive [math]x[/math] and [math]y[/math], we have

[math]\max(x, y) = 0.5 (\max(x - y, 0) + \max(y - x, 0) + \max(x + y, 0))[/math]

Indeed, the sum of the first two terms gives you the difference between the larger number and the smaller one, and adding the third term cancels out the smaller number

Here’s a simple neural network function based on ReLUs, to go along with the nice empirics in Jerry Liu‘s answer. I’ll consider [math]\max(x,y)[/math] to make things simpler, but the same can be done for [math]\min[/math] by switching some signs. I’ll try to make some (speculative) comments on what could make it easier or harder to learn with a gradient method.

For positive [math]x[/math] and [math]y[/math], we have

[math]\max(x, y) = 0.5 (\max(x - y, 0) + \max(y - x, 0) + \max(x + y, 0))[/math]

Indeed, the sum of the first two terms gives you the difference between the larger number and the smaller one, and adding the third term cancels out the smaller number. This just needs one hidden layer with 3 ReLU units, and all weights are on a similar scale, which makes this a possibly simple network to learn with a gradient method.

Note that ideally you’d just want the third term to be [math]x + y[/math], so that you’re not limited to positive numbers (or more precisely, numbers with a positive sum), and if you were to allow some units with “identity” activations, then this would be possible. But if you restrict yourself to 3 ReLUs and want to get a similar behavior on this term, you’d need to use bias terms to approximate [math]x + y[/math] by something like

[math]x + y = \max(b + x + y, 0) - b,[/math]

with some large enough value of [math]b[/math]. Let’s say all other weights are fixed to the right values, then learning such a parameter [math]b[/math] in this parameterization gives a very hard optimization problem! Indeed, [math]b[/math] now needs to become potentially large, depending on the minimum value of [math]x + y[/math] in your data. Additionally the first and second [math]b[/math] in the above are typically two separate bias parameters at different layers, and learning the first one can be difficult since the ReLU is a flat 0 whenever [math]b[/math] is too small, so that it’s difficult to get a gradient that would make it larger.

Allowing a 4th hidden unit might make the problem easier, since you can then write:

[math]x + y = \max(x + y, 0) - \max(-x - y, 0)[/math],

which might be easier to learn since the weights needed are again in a small range.

Profile photo for Prasoon Goyal

No, there’s a lot more to ML than just neural networks. But unfortunately, that’s how it is being marketed these days. See Prasoon Goyal's answer to Why is SVM still used in machine learning when Neural Networks are much more accurate? for limitations of neural networks.

Here’s a list of some of the central ideas/topics of ML that I compiled [Prasoon Goyal's answer to How do I learn Machine Learning in 10 days?]

Day 1:

  • Basic terminology:
    1. Most common settings: Supervised setting, Unsupervised setting, Semi-supervised setting, Reinforcement learning.
    2. Most common problems: Classification (binary & mul

No, there’s a lot more to ML than just neural networks. But unfortunately, that’s how it is being marketed these days. See Prasoon Goyal's answer to Why is SVM still used in machine learning when Neural Networks are much more accurate? for limitations of neural networks.

Here’s a list of some of the central ideas/topics of ML that I compiled [Prasoon Goyal's answer to How do I learn Machine Learning in 10 days?]

Day 1:

  • Basic terminology:
    1. Most common settings: Supervised setting, Unsupervised setting, Semi-supervised setting, Reinforcement learning.
    2. Most common problems: Classification (binary & multiclass), Regression, Clustering.
    3. Preprocessing of data: Data normalization.

Day 2:

    1. Terminology & Basic concepts: Convex optimization, Lagrangian, Primal-dual problems, Gradients & subgradients, [math]\ell_1[/math] and [math]\ell_2[/math] regularized objective functions.
    2. Algorithms: Batch gradient descent & stochastic gradient descent, Coordinate gradient descent.
    3. Implementation: Write code for stochastic gradient descent for a simple objective function, tune the step size, and get an intuition of the algorithm.

Day 3:

  • Classification:
    1. Logistic Regression
    2. Support vector machines: Geometric intuition, primal-dual formulations, notion of support vectors, kernel trick, understanding of hyperparameters, grid search.
    3. Online tool for SVM: Play with this online SVM tool (scroll down to “Graphic Interface”) to get some intuition of the algorithm.

Day 4:

  • Regression:
    1. Ridge regression
  • Clustering:
    1. k-means & Expectation-Maximization algorithm.
    2. Top-down and bottom-up hierarchical clustering.

Day 5:

  • Bayesian methods:
    1. Basic terminology: Priors, posteriors, likelihood, maximum likelihood estimation and maximum-a-posteriori inference.
    2. Gaussian Mixture Models
    3. Latent Dirichlet Allocation: The generative model and basic idea of parameter estimation.

Day 6:

  • Graphical models:
    1. Basic terminology: Bayesian networks, Markov networks / Markov random fields.
    2. Inference algorithms: Variable elimination, Belief propagation.
    3. Simple examples: Hidden Markov Models. Ising model.

Days 7–8:

    1. Basic terminology: Neuron, Activation function, Hidden layer.
    2. Convolutional neural networks: Convolutional layer, pooling layer, Backpropagation.
    3. Memory-based neural networks: Recurrent Neural Networks, Long-short term memory.
    4. Tutorials: I’m familiar with this Torch tutorial (you’ll want to look at [math]\texttt{1_supervised}[/math] directory). There might be other tutorials in other deep learning frameworks.

Day 9:

  • Miscellaneous topics:
    1. Decision trees
    2. Recommender systems
    3. Markov decision processes
    4. Multi-armed bandits

Day 10: (Budget day)

  • You can use the last day to catch up on anything left from previous days, or learn more about whatever topic you found most interesting / useful for your future work.
Profile photo for Aditya Parikh

There are several advance algorithms which are not NN, SVM or decision tree based. And might outperform oftentimes. Some of those I can remember are,

  1. Gradient Boosting - Popular ensemble model often used with weak predictive models.
  2. KNN - Non-pragmatic approach to classify based on similarity with nearest training points.
  3. Naive Bayes - Based on Baye’s theorem, simple yet a much powerful algorithm used mainly for large dataset.
  4. Random Forest maybe - It is not a decision tree, but a combination of many trees.
  5. PCA - again a popular unsupervised algorithm, mostly underutilised. Much useful while workin

There are several advance algorithms which are not NN, SVM or decision tree based. And might outperform oftentimes. Some of those I can remember are,

  1. Gradient Boosting - Popular ensemble model often used with weak predictive models.
  2. KNN - Non-pragmatic approach to classify based on similarity with nearest training points.
  3. Naive Bayes - Based on Baye’s theorem, simple yet a much powerful algorithm used mainly for large dataset.
  4. Random Forest maybe - It is not a decision tree, but a combination of many trees.
  5. PCA - again a popular unsupervised algorithm, mostly underutilised. Much useful while working with huge and complex data.
Profile photo for Quora User

The field of artificial neural networks is extremely complicated and readily evolving. In order to understand neural networks and how they process information, it is critical to examine how these networks function and the basic models that are used in such a process.

What are artificial neural networks?

Artificial neural networks are parallel computational models (unlike our computers, which have a single processor to collect and display information). These networks are commonly made up of multiple simple processors which are able to act in parallel alongside one another to model changing system

The field of artificial neural networks is extremely complicated and readily evolving. In order to understand neural networks and how they process information, it is critical to examine how these networks function and the basic models that are used in such a process.

What are artificial neural networks?

Artificial neural networks are parallel computational models (unlike our computers, which have a single processor to collect and display information). These networks are commonly made up of multiple simple processors which are able to act in parallel alongside one another to model changing systems. This parallel computing process also enables faster processing and computation of solutions. Neural networks follow a dynamic computational structure, and do not abide by a simple process to derive a desired output.

The basis for these networks originated from the biological neuron [1] and neural structures - every neuron takes in multiple unique inputs and produces one output. Similarly, in neural networks, different inputs are processed and modified by a weight, or a sort of equation that changes the original value. The network then combines these different weighted inputs with reference to a certain threshold and activation function and gives out the final value.

How do neural networks operate?

Artificial neural networks are organized into layers of parallel computing processes. For every processor in a layer, each of the number of inputs is multiplied by an originally established weight, resulting in what is called the internal value of the operation. This value is further changed by an originally created threshold value and sent to an activation function to map its output. The output of that function is then sent as the input for another layer, or as the final response of a network should the layer be the last. The weights and the threshold values are most commonly modified to produce the correct and most accurate value.

The learning mechanisms of a neural network

Looking at an analogy may be useful in understanding the mechanisms of a neural network. Learning in a neural network is closely related to how we learn in our regular lives and activities - we perform an action and are either accepted or corrected by a trainer or coach to understand how to get better at a certain task. Similarly, neural networks require a trainer in order to describe what should have been produced as a response to the input. Based on the difference between the actual value and the value that was outputted by the network, an error value is computed and sent back through the system. For each layer of the network, the error value is analyzed and used to adjust the threshold and weights for the next input. In this way, the error keeps becoming marginally lesser each run as the network learns how to analyze values.

The procedure described above is known as backpropogation, and is applied continuously through a network until the error value is kept at a minimum. At this point, the neural network no longer requires such a training process and is allowed to run without adjustments. The network may then finally be applied, using the adjusted weights and thresholds as guidelines.

The usage of a neural network while running

When a neural network is actively running, no backpropogation takes place as there is no way to directly verify the expected response. Instead, the validity of output statements are corrected during a new training session or are left as is for the network to run. Many adjustments may need to be made as the network consists of a great amount of variables that must be precise for the artificial neural network to function.

A basic example of such a process can be examined by teaching a neural network to convert text to speech. One could pick multiple different articles and paragraphs and use them as inputs for the network and predetermine a desired input before running the test. The training phase would then consist of going through the multiple layers of the network and using backpropogation to adjust the parameters and threshold value of the network in order to minimize the error value for all input examples. The network may then be tested on new articles to determine if it could truly convert text to proper speech.

Networks like these may be viable models for a great array of mathematical and statistical problems, including but not limited to speech synthesis and recognition, face recognition and prediction, nonlinear system modeling and pattern classification.

Conclusion

Neural networks are a new concept whose potential we have just scratched the surface of. They may be used for a variety of different concepts and ideas, and learn through a specific mechanism of backpropogation and error correction during the testing phase. By properly minimizing the error, these multi-layered systems may be able to one day learn and conceptualize ideas alone, without human correction.

Hope this helps! Please feel free to comment on this answer, A2A or PM me if you have any further questions, comments or concerns. Thank you!

[1] If you'd like to know more about how a neuron functions biologically, check out my answer to What is a neuron?

Sources

Most of this answer was from my knowledge through online courses and reading multiple different articles and papers regarding neural networks, but I used the following resources to enhance the quality of this answer:

[1] http://www.cheshireeng.com/Neuralyst/nnbg.htm
[2]
http://www.scientificamerican.com/article.cfm?id=experts-neural-networks-like-brain

Profile photo for Yariv Adan

Possible candidates:

  • Supervised Learning
  • Unsupervised Learning and Semi Supervised learning
  • (Deep) Reinforcement Learning
  • Generative Learning - RNNs, Auto Encoders, GANs (though these are often sub classes of deep learning)
  • Ensemble Learning
  • SVM
  • Decision Trees
  • Perceptrons
  • Multi layer Perceptrons / feed forward NN
  • Bolzman Machines and Restricted Bolzman Machines
  • Belief Networks

I am sure I am forgetting some. But then I recalled this:

Possible candidates:

  • Supervised Learning
  • Unsupervised Learning and Semi Supervised learning
  • (Deep) Reinforcement Learning
  • Generative Learning - RNNs, Auto Encoders, GANs (though these are often sub classes of deep learning)
  • Ensemble Learning
  • SVM
  • Decision Trees
  • Perceptrons
  • Multi layer Perceptrons / feed forward NN
  • Bolzman Machines and Restricted Bolzman Machines
  • Belief Networks

I am sure I am forgetting some. But then I recalled this:

Profile photo for Jeremy Singer

Machine learning models try to find values of parameters that will predict outcomes given input data. They do this by looking at training data and trying out different values at random for the parameters and measuring how different the results are from the training data. This is the loss function. They jitter the parameters as lightly and see if it improves or worsens the results. This is called minimizing the difference.

These cycles of calculation are done many times.

After creating a model from training data, it is tested on different data.

Bootstrapping reuses one set of labelled data into di

Machine learning models try to find values of parameters that will predict outcomes given input data. They do this by looking at training data and trying out different values at random for the parameters and measuring how different the results are from the training data. This is the loss function. They jitter the parameters as lightly and see if it improves or worsens the results. This is called minimizing the difference.

These cycles of calculation are done many times.

After creating a model from training data, it is tested on different data.

Bootstrapping reuses one set of labelled data into different parts so that at different times some data is for training a model and some times for testing.

Profile photo for Shanmugasundaram Muthuswamy

The best way to learn Neural Network (known as Deep Learning nowadays) is to understand the basics thoroughly. Some of the best books to start with are

  1. Artificial Neural Networks - Prof. Yegna Narayana Artificial Neural Networks: B. Yegnanarayana: 9788120312531: Amazon.com: Books
  2. Neural Networks - A Comprehensive Foundation by Prof. Simon Haykins Neural Networks: A Comprehensive Foundation (2nd Edition): Simon Haykin: 9780132733502: Amazon.com: Books
  3. After learning the fundamentals, try implementing some applications (which use ANN) using Python or C and you can easily master the subject in due c

The best way to learn Neural Network (known as Deep Learning nowadays) is to understand the basics thoroughly. Some of the best books to start with are

  1. Artificial Neural Networks - Prof. Yegna Narayana Artificial Neural Networks: B. Yegnanarayana: 9788120312531: Amazon.com: Books
  2. Neural Networks - A Comprehensive Foundation by Prof. Simon Haykins Neural Networks: A Comprehensive Foundation (2nd Edition): Simon Haykin: 9780132733502: Amazon.com: Books
  3. After learning the fundamentals, try implementing some applications (which use ANN) using Python or C and you can easily master the subject in due course. Best wishes !!!
Profile photo for Devender Shekhawat

Deep learning is a part of Machine Learning. A bit advanced though. Before learning any of those you should first brush up following mathematics’ concepts.

  1. Good knowledge of statistics and probability . Khan academy have a nicely curated library on video lectures of statistics and probability.
  2. Basic knowledge of calculus. Like partial derivatives and integrals. Again Khan academy.

Give maths at least one month. That's what I did before moving on to Machine Learning. After that you can move on to basic and advanced algorithms and concepts of Machine Learning.

You must know how to code in python

Udac

Deep learning is a part of Machine Learning. A bit advanced though. Before learning any of those you should first brush up following mathematics’ concepts.

  1. Good knowledge of statistics and probability . Khan academy have a nicely curated library on video lectures of statistics and probability.
  2. Basic knowledge of calculus. Like partial derivatives and integrals. Again Khan academy.

Give maths at least one month. That's what I did before moving on to Machine Learning. After that you can move on to basic and advanced algorithms and concepts of Machine Learning.

You must know how to code in python

Udacity's intro to Machine Learning is a good course. After this course you can gain more knowledge on each concepts from other online resources .

After having confidence on what you learnt you can move on to deep learning.

Profile photo for Yuki Yoho

The artificial neural network is an effective information processing system. It is inspired by how the human brain functions. The key element for any neural network is its novel structure for its information processing system. Many highly interconnected processing computing elements work together for solving specific problems.

They are based on how human minds work. Hence they can learn by example, just like we do. A specific neural network is configured for particular applications, like pattern recognition through a learning process. Just like we adapt to biological situations by adjusting our

The artificial neural network is an effective information processing system. It is inspired by how the human brain functions. The key element for any neural network is its novel structure for its information processing system. Many highly interconnected processing computing elements work together for solving specific problems.

They are based on how human minds work. Hence they can learn by example, just like we do. A specific neural network is configured for particular applications, like pattern recognition through a learning process. Just like we adapt to biological situations by adjusting our synaptic connections between neurons, the same stands true for neural networks. ANNs can be applied to many real-world problems of varying complexity. For all the problems that cannot be solved by conventional methods or problems that don’t have algorithmic solutions can be solved using ANN. There are many benefits of using neural networks in machine learning -

1. Non-Linear Data Processing -

Nonlinear systems can find shortcuts to reach computationally expensive solutions. Such systems can also infer connections between different data points, rather than waiting for records in a data source to be manually linked. This helps in easing commercial big-data analysis.

2. Self-Repair -

The biggest advantage of ANN systems is that they can find specific data that is no longer communicating and regenerate large chunks of data by inference. This helps in understanding which node is not working. This trait comes in handy for networks that need informing the users about the current state of the network. It can also self-debug and diagnose network problems, which makes it very convenient.

3. Fault Tolerance -

ANN has a high potential for high fault tolerance. Even if a few cells or information bits go missing, it won’t affect the operational capabilities of the artificial neural network.

4. Organic Learning -

The most important advantage is that ANN can learn organically. This is promising as this means that the expected outputs from ANN aren’t limited by the inputs and results given to them initially by an expert system. ANN can generalize the inputs, which is super helpful for robotics and pattern recognition systems.

Profile photo for Mayank Tewari

The question is fascinating, which makes many people curious about Neural Nets.
Some Really Good answers here, covering the Mathematical theory and biological analogy behind Neural Nets, I will try to give a simple hypothetical example, which may complement above explanations.


Problem :
Suppose I am an Android with an Intelligence based on Neural Net theory and taking my First Exam:
· I am required to answer 10 Questions.
· To Pass, I need to get 4 Correct
The Problem is:
· I lack Knowledge of the Subject.


Test 1:
· I am able to answer only 1 Question correctl

The question is fascinating, which makes many people curious about Neural Nets.
Some Really Good answers here, covering the Mathematical theory and biological analogy behind Neural Nets, I will try to give a simple hypothetical example, which may complement above explanations.


Problem :
Suppose I am an Android with an Intelligence based on Neural Net theory and taking my First Exam:
· I am required to answer 10 Questions.
· To Pass, I need to get 4 Correct
The Problem is:
· I lack Knowledge of the Subject.


Test 1:
· I am able to answer only 1 Question correctly and Thus , I failed.

Mathematical Modelling:

Now, let us see what Happened in the Exam and give a mathematical base to it,
· 10 Questions : # Input
· 1 Correct Attempt , thus I currently have only 10 % Knowledge of the Subject
· This could be written as
0.1 , quantifying my Current Knowledge Level .
# This is the Weight , Often called Synaptic Weight if you see the Biological Analogy and indicates the Current ‘
stored Knowledge’ of the Neural Net
· The Success Criteria is Clear:


if ,
Correct Attempts >= 4 .
then ,
Pass
Else
Fail


Mathematically,
f(x) = 1 , If x >= 4
f(x) = 0 , If x<4


# This is the
Activation Function or Transfer Function , which assesses my performance and Produces an Outcome , Pass or Fail in this example.

· Based on above function , The Outcome for this First Exam is 0 , that shows I Failed.


ANN Leaning:

Now, the above Outcome of the Exam1 , can be quantified as an Error :
Desired Outcome , T = 1 (Pass)
Actual Output , O = 0,zero (Fail)

Thus , in simple terms , Error = (T – O) = 1

The Positive Error shows a need to Learn and make a change to the Current Knowledge level of 0.1.

This Above incremental change (say, ∆W) can be Quantified as a function of the Error,like:
∆W = α x (T-O)

Where , α is the rate or Pace at which I learn, called learning rate. Based on this, I will Learn and make changes in my Current Knowledge Level of 0.1 ,denoted as ‘Weight’

Mathematically ,
Let us say , I learn at a rate of 10 % , thus, α = 0.1

∆W = 0.1 X (1 -0) = 0.1

W (after learning) = W+∆W = 0.1 + 0.1 = 0.2

Hereby , Learning from My Failure (Error) , I increased my Knowledge Level (New Weight) by 10 % (my Learning Rate) to 0.2.

In Test 2 ,

If given 10 Questions from the same subject , I will be able to do 2 Questions (0.2 X 10) correctly based on my Learning. But , as I am still unable to Pass the Test (4 Correct Questions) , I will keep on learning in the similar way as above till I get the Desired Result (Pass).


This was an Oversimplified example
(With Many Gaps) to intuitively understand , what we mean by Saying , ANN Learning and is Loosely based on :
- Single Perceptron NN , with Threshold /Hard Limit activation function , and learning by Perceptron learning Rule Algorithm.

Profile photo for Aarush Mohit, Ph.D.

Neural networks in machine learning or Artificial Neural networks (ANN) is a model designed to mimic the human neural network or precisely the Brain. ANN involves using a multi-layered network of several units to get insights from the data. It has an input layer, several hidden layers, and an output layer. The number of units in the input layer is dependent on the number of features you have in your data. The output layer has 1 or more units depending on how many outcomes you need to predict. The hidden layers can have any number of units and there can be any number of hidden layers as well, d

Neural networks in machine learning or Artificial Neural networks (ANN) is a model designed to mimic the human neural network or precisely the Brain. ANN involves using a multi-layered network of several units to get insights from the data. It has an input layer, several hidden layers, and an output layer. The number of units in the input layer is dependent on the number of features you have in your data. The output layer has 1 or more units depending on how many outcomes you need to predict. The hidden layers can have any number of units and there can be any number of hidden layers as well, depending on the requirements for training.

There are several tools and frameworks that you can use for implementing neural networks in Python. Some of the most common ones are:

  1. TensorFlow
  2. Pytorch
Profile photo for LTDC Team

An activation function is a function in between the input feeding the current neuron and its output going to the next layer. They basically decide whether the neuron should be activated or not.

Without the activation function, the weights and bias would simply do a linear transformation. A linear equation is simple to solve but is limited in its capacity to solve complex problems and has less power to learn complex functional mappings from data. A neural network without an activation function is just a linear regression model.

There are two types of activation function

  • Linear activation function

An activation function is a function in between the input feeding the current neuron and its output going to the next layer. They basically decide whether the neuron should be activated or not.

Without the activation function, the weights and bias would simply do a linear transformation. A linear equation is simple to solve but is limited in its capacity to solve complex problems and has less power to learn complex functional mappings from data. A neural network without an activation function is just a linear regression model.

There are two types of activation function

  • Linear activation function (Identity)
  • Non – Linear activation function

Linear activation function

It takes the inputs, multiplied by the weights for each neuron, and creates an output signal proportional to the input.

Z = W­1X1 + W­2X2 + b

Range: (-∞, ∞)

Non – Linear activation function

Neural network models use non-linear activation functions. They allow the model to create complex mappings between the inputs and outputs of the network, which are essential for learning and modeling complex data, such as images, audio, video.

Some commonly used non-linear activation functions are,

1. Relu (Rectified Linear Unit)

f(z) = max( 0, z)

f(z) = Range : (0, ∞)

2. Sigmoid activation function

3. tanh activation function

Profile photo for Anirudh Sharma

In Deep Learning, artificial neural networks play an important role in building any model. Artificial Neural Networks work on the basis of the structure and functions of a human brain. A human brain consists of neurons that process and transmit information between themselves. There are dendrites that receive inputs. Based on these inputs, they produce an output through an axon to another neuron. This is how a biological neural network looks like:

The term neural network was derived from the work of Warren S. McCulloch and Walter Pitts. These networks consist of artificial neurons called nodes t

In Deep Learning, artificial neural networks play an important role in building any model. Artificial Neural Networks work on the basis of the structure and functions of a human brain. A human brain consists of neurons that process and transmit information between themselves. There are dendrites that receive inputs. Based on these inputs, they produce an output through an axon to another neuron. This is how a biological neural network looks like:

The term neural network was derived from the work of Warren S. McCulloch and Walter Pitts. These networks consist of artificial neurons called nodes that process information and perform operations. There are 3 layers present in a neural network:

  • Input Layers: This layer takes large volumes of input data in the form of texts, numbers, audio files, image pixels, etc.
  • Hidden Layers: Hidden layers are responsible to perform the mathematical operation, pattern analysis, feature extraction, etc. There can multiple hidden layers in a neural network.
  • Output Layer: This layer is responsible to generate the desired output.

This is how an artificial neural network looks like:

An artificial neural network consists of several parameters and hyperparameters that drive the output of a neural network model. Some of these parameters are weights, biases, number of epochs, the learning rate, batch size, number of batches, etc.

Each node in the network has some weights assigned to it. A transfer function is used to calculate the weighted sum of inputs and a bias is added.

The result of the transfer function is fed as input to activation functions. Activation functions decide which nodes to fire. There are various activation functions that are used for specific purposes based on the type of output you are looking for. Some of the activation functions are Sigmoid, Step (Threshold), ReLU (Rectifier), Softmax, Hyperbolic Tangent function, etc. All these activation functions have specific usages. Some the functions are used only in the hidden layers, some in the output layers and there are a few which are used both in the hidden and output layers.

Based on the fired nodes, the output layer produces the final predicted output. But, there might be errors in the predicted output or the predicted output might be varying too much as compared to the original output.

Backpropagation is a technique used to minimize the error in the network. This error is calculated with the help of a cost function. We backpropagate the error and adjust the weights randomly to minimize the error. This process is repeated several times until we make sure that the difference between the predicted and original output is least.

In the above neural network, we are trying to classify the images of cats and dogs based on their image pixels. The cost function is calculated to measure the error in the network and this error is backpropagated and the weights are adjusted to minimize the cost function.

This is the overall explanation of how a neural network works.

Here is an interesting animated video to learn What is a Neural Network:

Profile photo for Rhys Olsen

Are neural networks a good way to setup machine learning?

NNs can (in principle) approximate any computable function with enough layers and a suitable training regime.

They’re popular because various NN architectures have set records for precision and recall across a gamut of learning tasks. They’ve replaced log-linear models and kernelized support vector machines as go-to blenders when there’s tons of data you want to represent and don’t want to craft explicit features. In fact, NNs are often good for automatically crafting features to then use in simpler models.

In practice, they’re expensive t

Are neural networks a good way to setup machine learning?

NNs can (in principle) approximate any computable function with enough layers and a suitable training regime.

They’re popular because various NN architectures have set records for precision and recall across a gamut of learning tasks. They’ve replaced log-linear models and kernelized support vector machines as go-to blenders when there’s tons of data you want to represent and don’t want to craft explicit features. In fact, NNs are often good for automatically crafting features to then use in simpler models.

In practice, they’re expensive to train and execute and have gone mainstream because backpropagation is an effective-ish way to train them, and computer hardware (especially memory-intensive multicore systems) can finally run them on scales that allow them to outshine alternatives. They require tons of training data to function well, lots of architectural tweaking to get good results (more or less for anything other than FFNNs), and can’t really be explained effectively except in terms of the hierarchy of perceptrons they embody (in contrast to old, “boring” models of data, like decision trees, 1st-order logic models, various forms of regression, hidden Markov models, etc., which can be readily explained in terms of simple operating principles). This disqualifies them from lots of applications (including almost any science where taking data is a manual process, and any application where you need the ability to tweak or explain particular parameters, or prove your instance is the best in a family of models). In addition, the truly impressive “exotic” training regimes and models require principles well beyond the scope of typical AI practitioners, including regret and decision theory, variational principles, graphic models, Lie theory, spin ensembles, etc.

In short, they’re very effective tools for particular tasks and are indispensable in a few narrow areas of importance. Using one when you don’t understand why and how you’re using it is a horrible waste of your time and resources. For 99% of applications, you should use the simplest model that effectively explains your dataset, which, for many datasets, is emphatically not an NN. Think of a model’s complexity as a piece of luggage you have to take with you everywhere. Pack light.

About · Careers · Privacy · Terms · Contact · Languages · Your Ad Choices · Press ·
© Quora, Inc. 2025