Sort
Profile photo for Reza Borhani

We actually do use other function approximators: in fact polynomials were the first provable universal approximators, this having been shown in 1885 via the so-called Stone–Weierstrass approximation theorem.

The Fourier basis (and its discrete derivatives) is another extremely popular function approximation tool, used particularly in physics, signal processing, and engineering fields.

These function approximators work fine, especially in low input-dimensions. As the input-dimension [math]N[/math] increases however, so does the number of basis elements in a polynomial or Fourier basis, and it does so combinat

We actually do use other function approximators: in fact polynomials were the first provable universal approximators, this having been shown in 1885 via the so-called Stone–Weierstrass approximation theorem.

The Fourier basis (and its discrete derivatives) is another extremely popular function approximation tool, used particularly in physics, signal processing, and engineering fields.

These function approximators work fine, especially in low input-dimensions. As the input-dimension [math]N[/math] increases however, so does the number of basis elements in a polynomial or Fourier basis, and it does so combinatorially fast. For example, the number of polynomial terms [math]M[/math] in a degree [math]D[/math] polynomial is given by

[math]M=\left(\begin{array}{c}N+D\\D\end{array}\right)[/math]

Fortunately, there’s a remedy here called the kernel trick. This is how, for example, SVMs work. Now there are serious scaling problems with kernel methods but I think I’m starting to get off topic here.

Here’s a simple experiment showing how one can use polynomial, Fourier, or neural network bases for function approximation.

Profile photo for Mona Jalal
  1. Regression tree
  2. Kernel regressors
  3. SVM
  4. Gaussian Process/Gaussian function mixture
  5. Random forest
  6. Kernel density estimator
  7. Radial basis function network

In in case you are interested, here is a list of non-ML approaches:

  1. Polynomial fitting
  2. Taylor series
  3. Fourier series
  4. Wavelets
  5. Lagrange interpolation
  6. Chebyshev’s series

Where do I start?

I’m a huge financial nerd, and have spent an embarrassing amount of time talking to people about their money habits.

Here are the biggest mistakes people are making and how to fix them:

Not having a separate high interest savings account

Having a separate account allows you to see the results of all your hard work and keep your money separate so you're less tempted to spend it.

Plus with rates above 5.00%, the interest you can earn compared to most banks really adds up.

Here is a list of the top savings accounts available today. Deposit $5 before moving on because this is one of th

Where do I start?

I’m a huge financial nerd, and have spent an embarrassing amount of time talking to people about their money habits.

Here are the biggest mistakes people are making and how to fix them:

Not having a separate high interest savings account

Having a separate account allows you to see the results of all your hard work and keep your money separate so you're less tempted to spend it.

Plus with rates above 5.00%, the interest you can earn compared to most banks really adds up.

Here is a list of the top savings accounts available today. Deposit $5 before moving on because this is one of the biggest mistakes and easiest ones to fix.

Overpaying on car insurance

You’ve heard it a million times before, but the average American family still overspends by $417/year on car insurance.

If you’ve been with the same insurer for years, chances are you are one of them.

Pull up Coverage.com, a free site that will compare prices for you, answer the questions on the page, and it will show you how much you could be saving.

That’s it. You’ll likely be saving a bunch of money. Here’s a link to give it a try.

Consistently being in debt

If you’ve got $10K+ in debt (credit cards…medical bills…anything really) you could use a debt relief program and potentially reduce by over 20%.

Here’s how to see if you qualify:

Head over to this Debt Relief comparison website here, then simply answer the questions to see if you qualify.

It’s as simple as that. You’ll likely end up paying less than you owed before and you could be debt free in as little as 2 years.

Missing out on free money to invest

It’s no secret that millionaires love investing, but for the rest of us, it can seem out of reach.

Times have changed. There are a number of investing platforms that will give you a bonus to open an account and get started. All you have to do is open the account and invest at least $25, and you could get up to $1000 in bonus.

Pretty sweet deal right? Here is a link to some of the best options.

Having bad credit

A low credit score can come back to bite you in so many ways in the future.

From that next rental application to getting approved for any type of loan or credit card, if you have a bad history with credit, the good news is you can fix it.

Head over to BankRate.com and answer a few questions to see if you qualify. It only takes a few minutes and could save you from a major upset down the line.

How to get started

Hope this helps! Here are the links to get started:

Have a separate savings account
Stop overpaying for car insurance
Finally get out of debt
Start investing with a free bonus
Fix your credit

Profile photo for Assistant
Assistant

Yes, there are several universal function approximators beyond neural networks. Here are a few notable examples:

  1. Polynomial Functions: Polynomial approximators can represent a wide range of continuous functions on a closed interval. According to the Weierstrass approximation theorem, any continuous function defined on a closed interval can be uniformly approximated by polynomials.
  2. Radial Basis Functions (RBF): RBF networks utilize radial basis functions as activation functions. They can approximate any continuous function and are particularly useful for interpolation problems.
  3. Support Vector Mach

Yes, there are several universal function approximators beyond neural networks. Here are a few notable examples:

  1. Polynomial Functions: Polynomial approximators can represent a wide range of continuous functions on a closed interval. According to the Weierstrass approximation theorem, any continuous function defined on a closed interval can be uniformly approximated by polynomials.
  2. Radial Basis Functions (RBF): RBF networks utilize radial basis functions as activation functions. They can approximate any continuous function and are particularly useful for interpolation problems.
  3. Support Vector Machines (SVM): With appropriate kernel functions, SVMs can approximate complex decision boundaries and are capable of handling high-dimensional data. The kernel trick allows SVMs to implicitly map input features into higher-dimensional spaces.
  4. Gaussian Processes (GP): GPs are a non-parametric method that can model functions and provide uncertainty estimates. They can approximate a wide variety of functions and are especially useful in Bayesian optimization.
  5. Decision Trees and Random Forests: These models can approximate complex functions by partitioning the input space into regions and making predictions based on the average output in those regions. They are particularly effective for both regression and classification tasks.
  6. Kernel Methods: Apart from SVMs, kernel methods in general can be used for function approximation by mapping input data into high-dimensional feature spaces.
  7. Fourier Series: Any periodic function can be approximated by a series of sine and cosine functions, making Fourier series a powerful tool in signal processing and other applications.

These methods have their own strengths and weaknesses, and the choice of approximator often depends on the specific characteristics of the problem being addressed, such as the nature of the data, the desired accuracy, and computational efficiency.

Profile photo for Quora User

This might not be a satisfactory answer, but neural networks are in essence a summation of sigmoid-like functions which are stacked on top of each other.

In some sense, the Fourier series is a universal function approximator… especially if your universe is the set of periodic functions..

And in another sense, as long as you zoom in far enough to some local space, most learning algorithms can approximate a function within that region of the function..

Take it how you will! Hope it was helpful..

Profile photo for Kevin Cameron

This is a question that I asked many years ago because I was interested in building a SPICE simulator that could reduce the transistor equations for a block to a single equation for the block (which eliminates awkward internal nodes). Unfortunately the only technique I’ve seen so far uses neural network models.

An alternative technique for SPICE is to use table-driven models — precalculate answers for particular points and interpolate.

Which techniques work will depend heavily on the data sets and algorithms used. You might be able to use genetic algorithms to guess at a short form function and

This is a question that I asked many years ago because I was interested in building a SPICE simulator that could reduce the transistor equations for a block to a single equation for the block (which eliminates awkward internal nodes). Unfortunately the only technique I’ve seen so far uses neural network models.

An alternative technique for SPICE is to use table-driven models — precalculate answers for particular points and interpolate.

Which techniques work will depend heavily on the data sets and algorithms used. You might be able to use genetic algorithms to guess at a short form function and then use SAT solvers to validate it.

Profile photo for Ravi Singh

Large number of IT working professionals 💼 in the software field are transitioning to Data Science roles. This is one of the biggest tech shifts happening in IT since last 20 Years. If you’re a working professional reading this post, you’ve likely witnessed this shift in your current company also. So Multiple Data science Courses are available online gain expertise in Data Science.

Logicmojo is an Best online platform out of them that offers live Data Science and AI certification courses for working professionals who wish to upskill 🚀 their careers or transition into a Data Scientist role. Th

Large number of IT working professionals 💼 in the software field are transitioning to Data Science roles. This is one of the biggest tech shifts happening in IT since last 20 Years. If you’re a working professional reading this post, you’ve likely witnessed this shift in your current company also. So Multiple Data science Courses are available online gain expertise in Data Science.

Logicmojo is an Best online platform out of them that offers live Data Science and AI certification courses for working professionals who wish to upskill 🚀 their careers or transition into a Data Scientist role. They focus on these two key 🤹‍♀️🤹‍♀️ aspects:

✅ Teaching candidates advanced Data Science and ML/AI concepts, followed by real-time projects. These projects add significant value to your resume.

✅ Assisting candidates in securing job placements through their job assistance program for Data Scientist or ML Engineer roles in product companies.

Once you have a solid portfolio of Data Science projects on your resume 📝 , you’ll get interview calls for Data Scientist or ML Engineer roles in product companies.

So, to secure a job in IT companies with a competitive salary 💰💸 , it’s crucial for software developers, SDEs, architects, and technical leads to include Data Science and Machine Learning skills in their skill-set 🍀✨. Those who align their skills with the current market will thrive in IT for the long term with better pay packages.

Recently in last few years, software engineer roles have decreased 📉 by 70% in the market, and many MAANG companies are laying off employees because they are now incorporating Data Science and AI into their projects. On the other hand, roles for Data Scientists, ML Engineers, and AI Engineers have increased 📈 by 85% in recent years, and this growth is expected to continue exponentially.

Self-paced preparation 👩🏻‍💻 for Data Science might take many years⌛, as learning all the new tech stacks from scratch requires a lot of time. Just Learning technical knowledge is not enough 🙄, you also need to have project experience in some live projects that you can showcase in your resume 📄. Based on these project experience only you will be shortlisted to Data Scientist roles. So,If you want a structured way of learning Data Science and Machine Learning/AI, it’s important to follow a curriculum that includes multiple projects across different domains.

Logicmojo's Data Science Live Classes offer 12+ real-time projects and 2+ capstone projects. These weekend live classes are designed for working professionals who want to transition from the software field to the Data Science domain 🚀. It is a 7-month live curriculum tailored for professionals, covering end-to-end Data Science topics with practical project implementation. After the course, the Logicmojo team provides mock interviews, resume preparation, and job assistance for product companies seeking Data Scientists and ML Engineers.

So, whether you are looking to switch your current job to a Data Scientist role or start a new career in Data Science, Logicmojo offers live interactive classes with placement assistance. You can also 👉 contact them for a detailed discussion with a senior Data Scientist with over 12+ years of experience. Based on your experience, they can guide you better over a call.

Remember, you need to upgrade 🚀 your tech skills to match the market trends; the market won’t change to accommodate your existing skills.

Profile photo for Joseph Bills

There are plenty. There are even ones more powerful that neural networks, like an algorithm that just tests every computable function sequentially until it finds one that generates the given data set, they are just far too inefficient to be used in practice. One that faster than neural networks though is polynomial regression.

Profile photo for Yassine Alouini

This is a handy-wavy answer to a mathematical notion. You should check the reference for more details.

I guess you are referring to this theorem

.

The Universal approximation theorem roughly states that “simple” neural networks can approximate “many” functions.

Notice that there are some restrictions on the family of functions you can approximate using a single hidden layer neural network (continuous functions on compact subsets of Rn).

There are also restrictions on the activation functions.

Finally, the algorithm complexity could be exponential making the implementation practically not feasible.

Footnotes

This is a handy-wavy answer to a mathematical notion. You should check the reference for more details.

I guess you are referring to this theorem

.

The Universal approximation theorem roughly states that “simple” neural networks can approximate “many” functions.

Notice that there are some restrictions on the family of functions you can approximate using a single hidden layer neural network (continuous functions on compact subsets of Rn).

There are also restrictions on the activation functions.

Finally, the algorithm complexity could be exponential making the implementation practically not feasible. Check computational learning theory

and big O notation for more details.

For a rigorous proof using a sigmoid

activation, you should check this paper.

Good luck.

Footnotes

Profile photo for Quora User

Here’s the thing: I wish I had known these money secrets sooner. They’ve helped so many people save hundreds, secure their family’s future, and grow their bank accounts—myself included.

And honestly? Putting them to use was way easier than I expected. I bet you can knock out at least three or four of these right now—yes, even from your phone.

Don’t wait like I did. Go ahead and start using these money secrets today!

1. Cancel Your Car Insurance

You might not even realize it, but your car insurance company is probably overcharging you. In fact, they’re kind of counting on you not noticing. Luckily,

Here’s the thing: I wish I had known these money secrets sooner. They’ve helped so many people save hundreds, secure their family’s future, and grow their bank accounts—myself included.

And honestly? Putting them to use was way easier than I expected. I bet you can knock out at least three or four of these right now—yes, even from your phone.

Don’t wait like I did. Go ahead and start using these money secrets today!

1. Cancel Your Car Insurance

You might not even realize it, but your car insurance company is probably overcharging you. In fact, they’re kind of counting on you not noticing. Luckily, this problem is easy to fix.

Don’t waste your time browsing insurance sites for a better deal. A company called Insurify shows you all your options at once — people who do this save up to $996 per year.

If you tell them a bit about yourself and your vehicle, they’ll send you personalized quotes so you can compare them and find the best one for you.

Tired of overpaying for car insurance? It takes just five minutes to compare your options with Insurify and see how much you could save on car insurance.

2. Ask This Company to Get a Big Chunk of Your Debt Forgiven

A company called National Debt Relief could convince your lenders to simply get rid of a big chunk of what you owe. No bankruptcy, no loans — you don’t even need to have good credit.

If you owe at least $10,000 in unsecured debt (credit card debt, personal loans, medical bills, etc.), National Debt Relief’s experts will build you a monthly payment plan. As your payments add up, they negotiate with your creditors to reduce the amount you owe. You then pay off the rest in a lump sum.

On average, you could become debt-free within 24 to 48 months. It takes less than a minute to sign up and see how much debt you could get rid of.

3. You Can Become a Real Estate Investor for as Little as $10

Take a look at some of the world’s wealthiest people. What do they have in common? Many invest in large private real estate deals. And here’s the thing: There’s no reason you can’t, too — for as little as $10.

An investment called the Fundrise Flagship Fund lets you get started in the world of real estate by giving you access to a low-cost, diversified portfolio of private real estate. The best part? You don’t have to be the landlord. The Flagship Fund does all the heavy lifting.

With an initial investment as low as $10, your money will be invested in the Fund, which already owns more than $1 billion worth of real estate around the country, from apartment complexes to the thriving housing rental market to larger last-mile e-commerce logistics centers.

Want to invest more? Many investors choose to invest $1,000 or more. This is a Fund that can fit any type of investor’s needs. Once invested, you can track your performance from your phone and watch as properties are acquired, improved, and operated. As properties generate cash flow, you could earn money through quarterly dividend payments. And over time, you could earn money off the potential appreciation of the properties.

So if you want to get started in the world of real-estate investing, it takes just a few minutes to sign up and create an account with the Fundrise Flagship Fund.

This is a paid advertisement. Carefully consider the investment objectives, risks, charges and expenses of the Fundrise Real Estate Fund before investing. This and other information can be found in the Fund’s prospectus. Read them carefully before investing.

4. Earn Up to $50 this Month By Answering Survey Questions About the News — It’s Anonymous

The news is a heated subject these days. It’s hard not to have an opinion on it.

Good news: A website called YouGov will pay you up to $50 or more this month just to answer survey questions about politics, the economy, and other hot news topics.

Plus, it’s totally anonymous, so no one will judge you for that hot take.

When you take a quick survey (some are less than three minutes), you’ll earn points you can exchange for up to $50 in cash or gift cards to places like Walmart and Amazon. Plus, Penny Hoarder readers will get an extra 500 points for registering and another 1,000 points after completing their first survey.

It takes just a few minutes to sign up and take your first survey, and you’ll receive your points immediately.

5. This Online Bank Account Pays 10x More Interest Than Your Traditional Bank

If you bank at a traditional brick-and-mortar bank, your money probably isn’t growing much (c’mon, 0.40% is basically nothing).1

But there’s good news: With SoFi Checking and Savings (member FDIC), you stand to gain up to a hefty 3.80% APY on savings when you set up a direct deposit or have $5,000 or more in Qualifying Deposits and 0.50% APY on checking balances2 — savings APY is 10 times more than the national average.1

Right now, a direct deposit of at least $1K not only sets you up for higher returns but also brings you closer to earning up to a $300 welcome bonus (terms apply).3

You can easily deposit checks via your phone’s camera, transfer funds, and get customer service via chat or phone call. There are no account fees, no monthly fees and no overdraft fees.* And your money is FDIC insured (up to $3M of additional FDIC insurance through the SoFi Insured Deposit Program).4

It’s quick and easy to open an account with SoFi Checking and Savings (member FDIC) and watch your money grow faster than ever.

Read Disclaimer

5. Stop Paying Your Credit Card Company

If you have credit card debt, you know. The anxiety, the interest rates, the fear you’re never going to escape… but a website called AmONE wants to help.

If you owe your credit card companies $100,000 or less, AmONE will match you with a low-interest loan you can use to pay off every single one of your balances.

The benefit? You’ll be left with one bill to pay each month. And because personal loans have lower interest rates (AmONE rates start at 6.40% APR), you’ll get out of debt that much faster.

It takes less than a minute and just 10 questions to see what loans you qualify for.

6. Earn Up to $225 This Month Playing Games on Your Phone

Ever wish you could get paid just for messing around with your phone? Guess what? You totally can.

Swagbucks will pay you up to $225 a month just for installing and playing games on your phone. That’s it. Just download the app, pick the games you like, and get to playing. Don’t worry; they’ll give you plenty of games to choose from every day so you won’t get bored, and the more you play, the more you can earn.

This might sound too good to be true, but it’s already paid its users more than $429 million. You won’t get rich playing games on Swagbucks, but you could earn enough for a few grocery trips or pay a few bills every month. Not too shabby, right?

Ready to get paid while you play? Download and install the Swagbucks app today, and see how much you can earn!

Profile photo for Chris von Csefalvay

A neural network is a model. A model is basically a function that maps an input to an output, coupled with a mechanism to learn this mapping by minimising a function that describes the distance between a predicted output and an observed value (the loss function). In this sense, any other model is an “alternative” to a neural network: from simple linear regressors through GLMs to support vector machines, decision trees, random forests, you name it.

On the other hand, functionally, what are the alternatives to neural networks? In the fields where neural networks perform best — convolutional neura

A neural network is a model. A model is basically a function that maps an input to an output, coupled with a mechanism to learn this mapping by minimising a function that describes the distance between a predicted output and an observed value (the loss function). In this sense, any other model is an “alternative” to a neural network: from simple linear regressors through GLMs to support vector machines, decision trees, random forests, you name it.

On the other hand, functionally, what are the alternatives to neural networks? In the fields where neural networks perform best — convolutional neural networks in image recognition, neural GANs for generating stuff, transformer models —, there are few very viable competitors. The competitors of most neural networks are other neural networks. In this sense, neural models are a “metamodel” structure the same way generalised linear models or “linear modeling” or “linear programming” are a family of approaches.

The reason for the dominance of neural networks is that they ultimately boil down to relatively simple mathematics. Neural networks are nothing new — the theoretical basis has been around since Hebb’s work in the 1940s. The computational infrastructure that emerged to do computer graphics (and computer games) has been repurposed to do the calculations to train neural networks (GPGPU or ‘general purpose computing on GPUs’). Eventually, custom designs such as “AI accelerators” and systolic arrays like Google’s TPUs emerged, and continue to emerge. As long as there’s this near-perfect marriage between software and hardware, any alternative must have a quite compelling value proposition.

Profile photo for Quora User

The book ‘Introduction to Machine Learning’ by Alpaydin has a very good explanation of how RBFs compare with feedforward neural nets (NNs). Read section 12.3.

Summary answer: RBFs train faster than NNs but are actually a more inefficient model and are impractical for huge number of input dimensions.

Detailed explanation below.

Advantage of NNs over RBFs:

  • RBFs are local representation methods while NNs are distributed representation methods. Local representation methods are inefficient representations because they require roughly one hidden node per input region while in the latter a smaller number

The book ‘Introduction to Machine Learning’ by Alpaydin has a very good explanation of how RBFs compare with feedforward neural nets (NNs). Read section 12.3.

Summary answer: RBFs train faster than NNs but are actually a more inefficient model and are impractical for huge number of input dimensions.

Detailed explanation below.

Advantage of NNs over RBFs:

  • RBFs are local representation methods while NNs are distributed representation methods. Local representation methods are inefficient representations because they require roughly one hidden node per input region while in the latter a smaller number of hidden nodes are sufficient to jointly compactly represent the entire input space in a distributed manner or what is called ‘black box’ approach.
  • Thus when the input space is huge like due to huge number of dimensions, it is better to use NNs and RBFs become impractical. This is the curse of dimensionality.

However: Disadvantage of NNs compared to RBFs:

  • RBFs train faster because each hidden node’s output weight quickly converges by winning over the other nodes’s output by presenting the input values corresponding to that node. On the other hand, in training NNs when you present the network with inputs of a particular region, *all* the hidden layer weights get updated and that too only gradually. This prolongs the weight convergence when you present inputs from all regions of the input space in whatever order, for training.
  • Due to being a distributed representation, NNs are a black box method i.e. by looking at the network weights you cannot intuitively understand how each region of input gets transformed to the output. This is because that transformation is ‘spread’ out over all the hidden weights in a ‘mysterious’ manner that is difficult to fathom intuitively. Thus knowledge/rule extraction from NNs is very difficult. Another way of putting this is that the discriminative features that NNs extract from the data are mysterious and difficult to fathom. On the other hand, the RBFs are relatively far more white box: the RBF centers, spreads and output weights of each hidden node will directly tell you which part of the input region is handled by each hidden node and how it transforms to the output value. The discriminative features figured out by RBFs are nothing but the means and spreads of the hidden nodes. Thus knowledge/rule extraction is far easier. Within the RBF class of algorithms, the competitive methods may be even easier to understand than the cooperative methods because they are explicitly designed to make one hidden node win out over the others.
Work with world's best developers from ELEKS that deliver results and improve your business.
Profile photo for Roman Trusov

The main purpose of most activation function is to introduce non-linearity in the network so it would be capable of learning more complex patterns. It’s, indeed, the right way of looking at the problem of choice, because good ol’ math is the only known way of preventing screwups.

For educational purposes, it’s very useful to look at these functions in terms of what they actually do. From your physics class you may remember the optics experiments where the light goes through a system of different lenses - this is a perfect analogy. Your input data is the light, it can be all over the spectrum, w

The main purpose of most activation function is to introduce non-linearity in the network so it would be capable of learning more complex patterns. It’s, indeed, the right way of looking at the problem of choice, because good ol’ math is the only known way of preventing screwups.

For educational purposes, it’s very useful to look at these functions in terms of what they actually do. From your physics class you may remember the optics experiments where the light goes through a system of different lenses - this is a perfect analogy. Your input data is the light, it can be all over the spectrum, which may or may not be what you want. So, if you know exactly what each of your lenses does, you can predict how your output would look in principle.

In this post I’ll describe the intuition behind the choices of functions, if you want to really know how things work, refer to Goodfellow et al. (2016)

  1. Linear

    The simplest case, no activation, the signal goes through unchanged. This is a good choice for regression problems.
  2. ReLU/ELU

    [math]max(0, x)[/math]

    Basically, “I want to see only what I am looking for”. This is my favorite function, because it comes simultaneously from electrical engineering and biology. To give a biological analogy, when the incoming signal hits the neuron, it can be recognized as “less relevant” or “more relevant”. By zeroing “less relevant” signals, ReLU acts as an excitatory neuron that reacts on relevant signals and passes the information about them further. That’s one of the reasons why they are very useful in computer vision. A rule of thumb is to use them after convolutional layers, but sometimes they can be used after dense layers if you want more sparse activations.
    I recommend reading more about it, both in CS literature and general scientific literature, Nature
    has an incredibly exciting article that introduced the whole idea.
  3. Softmax

    [math]\frac{e^{x_i}}{\sum\limits_ie^{x_i}}[/math]

    Also known as “give-me-the-probability-distribution” function. If you know that whatever comes out of the layer must be a distribution, this is what you want to use.
    Another famous example is using softmax as a gate. If you have a signal
    [math]x[/math], a layer [math]L_1[/math] and you want to decide which parts of [math]L_1(x)[/math] you want to pass further, you can introduce a gate [math]L_2(x)[/math] with softmax activation that has the same dimensionality. An elemwise product [math]L_1(x) \cdot L_2(x)[/math] after training will contain “filtered” version of [math]L_1(x)[/math], possibly helping to reduce the noise in following layers.
  4. Tanh

    [math]\frac{e^z - 1}{e^z + 1}[/math]

    In some cases the sign of the output is relevant, but the magnitude can mess with the further computations. Tanh is an elegant way to “squish” the output into
    [math][-1, 1][/math] range, preserving the sign and conforming to the boundary conditions [math]f(0) = 0, f’(\infty) = 0[/math]. It’s useful when after the magnitude of unprocessed output grows significantly, the further growth is not that important, and vice versa, when the fluctuations around zero make significant difference.

In practical applications, when the architectures grow more complex and you will probably use different activations in different parts of the network, the only way to get better is to get broader exposure to the existing practices, many good papers contain detailed discussions of such choices.

Footnotes

Profile photo for LTDC Team

An activation function is a function in between the input feeding the current neuron and its output going to the next layer. They basically decide whether the neuron should be activated or not.

Without the activation function, the weights and bias would simply do a linear transformation. A linear equation is simple to solve but is limited in its capacity to solve complex problems and has less power to learn complex functional mappings from data. A neural network without an activation function is just a linear regression model.

There are two types of activation function

  • Linear activation function

An activation function is a function in between the input feeding the current neuron and its output going to the next layer. They basically decide whether the neuron should be activated or not.

Without the activation function, the weights and bias would simply do a linear transformation. A linear equation is simple to solve but is limited in its capacity to solve complex problems and has less power to learn complex functional mappings from data. A neural network without an activation function is just a linear regression model.

There are two types of activation function

  • Linear activation function (Identity)
  • Non – Linear activation function

Linear activation function

It takes the inputs, multiplied by the weights for each neuron, and creates an output signal proportional to the input.

Z = W­1X1 + W­2X2 + b

Range: (-∞, ∞)

Non – Linear activation function

Neural network models use non-linear activation functions. They allow the model to create complex mappings between the inputs and outputs of the network, which are essential for learning and modeling complex data, such as images, audio, video.

Some commonly used non-linear activation functions are,

1. Relu (Rectified Linear Unit)

f(z) = max( 0, z)

f(z) = Range : (0, ∞)

2. Sigmoid activation function

3. tanh activation function

Your response is private
Was this worth your time?
This helps us sort answers on the page.
Absolutely not
Definitely yes
Profile photo for Shashank Shiv

With the advent of Artificial Intelligence, Deep learning and the likes. I’m gonna assume you meant,

What is Artificial Neural Networks really all about?

Before I get to answering that, I’ll actually go ahead and tell you what a Neural Network is.

A human brain consists of these tiny cells called neurons which look like this,

Now this is one single neuron among millions of such neurons in your brain. They are all interconnected with one another, leading to billions of such connections. These connections play a major decision making role in helping us decipher a dog from a cat. Every time you see a

With the advent of Artificial Intelligence, Deep learning and the likes. I’m gonna assume you meant,

What is Artificial Neural Networks really all about?

Before I get to answering that, I’ll actually go ahead and tell you what a Neural Network is.

A human brain consists of these tiny cells called neurons which look like this,

Now this is one single neuron among millions of such neurons in your brain. They are all interconnected with one another, leading to billions of such connections. These connections play a major decision making role in helping us decipher a dog from a cat. Every time you see a picture of a dog, a small electric impulse goes from one neuron to the other through a part of network to tell you that it’s a dog. Basically, a network of neurons is a decision making entity - if dog then impulse; if not dog then no impulse. This is a biological neural network in a nutshell.

Now, an Artificial Neural Network or ANN is exactly what it sounds like, Artificial. It’s a model of what our brain does. ANNs are represented like this,

Every circle (node) in the above image is analogous to the biological neuron you saw before, every arrow between the circles is a connection between two neurons. An impulse in ANNs is modeled as weights. Weights in an ANN is basically a real number which indicates how strongly the neuron thinks it’s a dog or not, if it is a large positive weight then it’s more likely to be a dog, if it is negative it’s very less likely it’s a dog.

So how does an ANN learn?

Like how as children we were all taught the difference between a dog and a cat, we have to feed data of dogs and cats for the ANN to learn. This is called Supervised Learning, where we input dog pictures and tell the ANN this is a dog. Then we test it’s efficacy by feeding it different dog pictures, like a test or an exam.

Another form of learning is called Unsupervised Learning, where a lot of pictures of dogs and cats are basically shown to the ANN and it has to segregate them into two bins based on the similarities of the data.

It learns by modifying its weights based on the output it predicts versus the output it was supposed to give. Every time it predicts a dog picture as a cat, it’s weights are modified in the direction that is less likely to make the same mistake.

This is what Artificial Neural Networks are on a broad scale without getting in to the mathematics of it.

Profile photo for Christine

Yes, there is evidence that recurrent neural networks (RNNs) are universal approximators, but with some nuances. Here's a breakdown:

  • Universal Approximation Theorem: This theorem states that a sufficiently large feed-forward neural network with one hidden layer using specific activation functions can approximate any continuous function to an arbitrary degree of accuracy.
  • RNNs and Universal Approximation: While RNNs differ from standard feed-forward networks due to their recurrent connections, research suggests that they too can be considered universal approximators under certain conditions.
  • Key P

Yes, there is evidence that recurrent neural networks (RNNs) are universal approximators, but with some nuances. Here's a breakdown:

  • Universal Approximation Theorem: This theorem states that a sufficiently large feed-forward neural network with one hidden layer using specific activation functions can approximate any continuous function to an arbitrary degree of accuracy.
  • RNNs and Universal Approximation: While RNNs differ from standard feed-forward networks due to their recurrent connections, research suggests that they too can be considered universal approximators under certain conditions.
  • Key Papers: The paper "Recurrent neural networks are universal approximators" by Hornik, Stinchcombe, and White (1989) is a foundational work in this area. It proves that RNNs in state-space form with a single hidden layer and a specific activation function can approximate any continuous function on a compact set.
  • Nuances and Considerations:The proof relies on specific conditions, such as the network having a finite number of states and specific activation functions.The practicalities of achieving universal approximation with RNNs can be more challenging than with feed-forward networks due to issues like vanishing gradients and the complexity of training recurrent architectures.The theorem focuses on approximating continuous functions. RNNs might not be ideal for approximating all types of functions (e.g., discontinuous functions).

In conclusion, there is theoretical evidence supported by research that RNNs can be considered universal approximators under certain conditions. However, the practicalities of achieving this and the types of functions they can approximate effectively require further consideration.

Here are some additional points to keep in mind:

  • The field of neural networks is constantly evolving, and new research might provide further insights into the universal approximation capabilities of RNNs and other neural network architectures.
  • Even if a network is theoretically capable of universal approximation, training it effectively to achieve that level of accuracy can be computationally expensive and might not always be necessary for practical applications.
Profile photo for Asraful Sharker Nirob

Neural networks, particularly deep feedforward networks, are universal approximators, meaning they can approximate any continuous function under certain conditions. However, there are some functions that they cannot approximate well:

  1. Discontinuous Functions – Neural networks struggle with functions that have sharp discontinuities, such as: The step function (e.g., Heaviside function). Functions with sudden jumps in values.
  2. Highly Oscillatory Functions – Functions with extreme variations over small intervals, like: The Dirichlet function, which is 1 for rational numbers and 0 for irrationals. The

Neural networks, particularly deep feedforward networks, are universal approximators, meaning they can approximate any continuous function under certain conditions. However, there are some functions that they cannot approximate well:

  1. Discontinuous Functions – Neural networks struggle with functions that have sharp discontinuities, such as: The step function (e.g., Heaviside function). Functions with sudden jumps in values.
  2. Highly Oscillatory Functions – Functions with extreme variations over small intervals, like: The Dirichlet function, which is 1 for rational numbers and 0 for irrationals. The sign function, which abruptly changes at zero.
  3. Non-Measurable Functions – Functions that are not Lebesgue measurable, like some constructs from set theory, cannot be represented by neural networks.
  4. Computationally Undecidable Functions – Problems related to the halting problem or non-Turing-computable functions.
  5. Sparse and High-Frequency Functions – Some functions with very high-frequency details or fractal-like structures may require an impractically large network to approximate.

In practice, neural networks can approximate most real-world functions if given enough layers and training data, but they still face challenges with sharp discontinuities, infinite complexity, or logical non-computability.

Profile photo for Eugene Yarovoi

Most other types of neural networks have what are called inductive biases, baked-in assumptions about the structure of the data that constrain what functions the network can express. This is done intentionally to reduce the number of parameters that the model needs. Let’s take an image classification task. If we accept an input 300x300x3 (color channels) image, we have 270,000 input features! If we were to process these unconstrained via a fully-connected network, with no assumptions like “we’re looking for the same types of signals in every part of the image”, we would probably need millions

Most other types of neural networks have what are called inductive biases, baked-in assumptions about the structure of the data that constrain what functions the network can express. This is done intentionally to reduce the number of parameters that the model needs. Let’s take an image classification task. If we accept an input 300x300x3 (color channels) image, we have 270,000 input features! If we were to process these unconstrained via a fully-connected network, with no assumptions like “we’re looking for the same types of signals in every part of the image”, we would probably need millions of neurons in each hidden layer, and trillions of weights (a fully connected layer between two 1M neuron layers has 1T weights)! Such models would be hopelessly too large to train.

In a CNN, we assume that we can share the weights for each convolution regardless of where it happens in the image. That’s because our understanding of the underlying task (let’s say image classification) is that an object is the same kind of object regardless of where in an image it occurs. We generally assume a beach ball is a beach ball regardless of whether it occurs on the left or right side of an image, and that its overall structure (its constituent geometric features that make us say “this is a beach ball”) is largely unchanged by its location in an image.

In an RNN, we assume that we have a sequence that is to be treated according to some rules that are (roughly) unchanging over the input elements (e.g. over time if the sequence is a time series). In other words, if there is some useful rule to be learned like “if you see two 1s in a row, classify this instance as positive”, such a rule might at least possibly apply to all the positions of the sequence. RNNs can be a little bit more flexible, in that they can be modified to accept the position of the input element as an input, and therefore have some capacity to learn to do certain things only at specific input positions. Furthermore, RNNs can implicitly learn position information based on the order the input tokens are presented (they are a stateful model over the sequence of tokens). However, their architecture still (quite intentionally) biases them towards finding rules that are more generally applicable over many input positions. If these networks had to re-learn all their rules from scratch at each position, the architecture would again have too many parameters, and would also completely fail to generalize to sequence sizes not seen during training.

So, can you use CNNs and RNNs to approximate an unknown function mapping, the way that you can use a feedforward network? I’m assuming the way you would try to do so for an RNN would be to feed the function’s inputs into the model in a pre-defined order. In other words, if your function has 100 real-valued inputs, you would call the RNN with a sequence of size 100, with each feature as one element in the sequence. In theory, an RNN is able to take the order of elements into account, so for a big enough network, the representational power is there, at least in theory. However, because the network has a strong bias towards trying to treat different timesteps in a similar manner, the network’s performance would suffer from learning many spurious relationships between features as it would try to treat many completely unrelated features as though they represent similar data.

(Of course, you could frame the task as a single timestep where you would supply a vector of size 100, but this is not very different from a feedforward network then.)

For a CNN, I assume you would take your 100 features and encode them (in some arbitrary order) into a 10x10 grid. Here the network is hard-wired to assume that some relationship extracted between adjacent features in one part of the “image” is somehow meaningful in another part of the “image”, but this is very bad because the features are arranged arbitrarily and no such meaning exists. Even the notion of extracting the relationship between “adjacent” features doesn’t make sense, since the features are just arranged on the grid arbitrarily.

The only meaningful relationships the CNN’s layers can learn, then, are patterns that are useful to recognize somewhere in the image, and that mostly won’t occur anywhere else as a false signal. In particular, the CNN is designed to be largely invariant to translation, so if for example you are supposed to learn to classify an instance as positive when feature1 = 2 and feature2 = 3, and negative if feature3 = 2 and feature4 = 3 and, by the coincidental arrangement of features, feature3 and feature4 are positioned in just the right manner in the grid relative to feature1 and feature2, the CNN will be forced to learn that either both instances are positive, or both are negative, or not conclude anything from either scenario, since the CNN is constrained to apply the same rule in both cases. So, CNNs in general should not be used to represent arbitrary functions.

In general, a model that possesses a certain inductive bias should not be used on problems where that bias does not make sense. Some models are more expressive than others and can still explicitly or implicitly capture the additional information relative to which we assume the answer is mostly invariant (e.g. position in an RNN). Even then, the presence of a bias towards finding rules that are invariant relative to some attribute (e.g. location, time) will likely hurt the model’s performance when no such invariant rules exist.


Some more advanced content if you understand the concept of embeddings:

Interestingly, some recent architectures have shown that the bias of applying the same rules to disparate features is not necessarily unreasonable. TabTransformer is a transformer-based architecture for tabular data that does this, in a way. A transformer can be thought of as an architecture similar in some ways to a CNN, except that instead of creating information flow between adjacent positions, each position in the sequence can “search” all other positions in the sequence for some sort of data, intuitively some kind of similarity. For example, if we are processing a word in a sentence to prepare to translate it, words related to it are also brought into the context.

In TabTransformer, we take all categorical features and encode them via embeddings. The embeddings are then treated as elements of a sequence, with a bias of treating them homogenously, even though we are careful to include positional information for each feature to ensure that the model can still tell features apart and treat them differently if needed. This approach works because the embeddings are learnable, and in effect, what we are learning is how to encode elements in a shared embedding space so that elements with similar embeddings can in fact be treated similarly.

Profile photo for Paras Upadhyay

Neural networks are a fundamental component of deep learning, a subset of machine learning inspired by the human brain's neural structure. Here’s how they function and some of their applications:

### Functioning of Neural Networks:

1. **Neurons and Layers**: Neural networks consist of interconnected nodes called neurons organized in layers. Typically, there are three types of layers:

- **Input Layer**: Receives initial data.

- **Hidden Layers**: Process inputs through weighted connections and activation functions.

- **Output Layer**: Produces the final output based on the processed information.

2. *

Neural networks are a fundamental component of deep learning, a subset of machine learning inspired by the human brain's neural structure. Here’s how they function and some of their applications:

### Functioning of Neural Networks:

1. **Neurons and Layers**: Neural networks consist of interconnected nodes called neurons organized in layers. Typically, there are three types of layers:

- **Input Layer**: Receives initial data.

- **Hidden Layers**: Process inputs through weighted connections and activation functions.

- **Output Layer**: Produces the final output based on the processed information.

2. **Connections**: Neurons in one layer are connected to neurons in the next layer through weighted connections. These weights determine the strength of influence one neuron has on another.

3. **Activation Functions**: Each neuron applies an activation function to its weighted sum of inputs to introduce non-linearity, allowing the network to model complex relationships in data.

4. **Training**: Neural networks learn by adjusting weights based on the error between predicted and actual outputs during training. This process, often using algorithms like backpropagation, aims to minimize this error through optimization techniques like gradient descent.

### Applications of Neural Networks:

1. **Image and Speech Recognition**: Convolutional Neural Networks (CNNs) are used for tasks like image classification, object detection, and facial recognition. Recurrent Neural Networks (RNNs) are employed for speech recognition and language translation.

2. **Natural Language Processing (NLP)**: NLP tasks such as sentiment analysis, language generation, and named entity recognition benefit from recurrent and transformer-based architectures like GPT (Generative Pre-trained Transformer).

3. **Recommendation Systems**: Neural networks power recommendation algorithms on platforms like Netflix and Amazon, suggesting products or content based on user preferences.

4. **Medical Diagnosis**: They assist in diagnosing diseases from medical images and predicting outcomes based on patient data.

5. **Autonomous Vehicles**: Neural networks enable object detection, path planning, and decision-making for self-driving cars.

6. **Financial Forecasting**: They analyze trends in financial data for stock market predictions, risk assessment, and algorithmic trading.

7. **Gaming**: Neural networks are used to create intelligent agents in games, capable of learning and adapting to player behavior.

8. **Robotics**: They play a role in robotic control systems, allowing robots to perceive their environment and make decisions accordingly.

Neural networks have revolutionized many fields by providing powerful tools for processing complex data, learning patterns, and making predictions or decisions with increasing accuracy. Their versatility and ability to handle large datasets have made them a cornerstone of modern artificial intelligence applications.

Profile photo for PK

There are different activation functions, some of them are used in Neural Networks (NN).

  • Sigmoid
  • Hyperbolic Tan or Tanh
  • Rectified Linear Unit or ReLU
    • Leaky ReLU
    • Maxout
  • Softmax

In NN, there are two places where the usage of activation function becomes significant. One is at every neuron. As we know, the process at every neuron is: Take the inputs, apply the weights, add the bias, apply activation function. Second place where the activation function is significant is the final layer of NN. When NN is used for a classification job, it can be either binary classification (Yes/No) or multi class classific

There are different activation functions, some of them are used in Neural Networks (NN).

  • Sigmoid
  • Hyperbolic Tan or Tanh
  • Rectified Linear Unit or ReLU
    • Leaky ReLU
    • Maxout
  • Softmax

In NN, there are two places where the usage of activation function becomes significant. One is at every neuron. As we know, the process at every neuron is: Take the inputs, apply the weights, add the bias, apply activation function. Second place where the activation function is significant is the final layer of NN. When NN is used for a classification job, it can be either binary classification (Yes/No) or multi class classification. If it is a NN which performs binary classification, then Sigmoid is the activation function that is used in the final layer. If the NN is designed to perform a multi class classification, then Softmax is used as the activation function at the final layer. And, at every neuron, most commonly used activation function is ReLU.

Profile photo for Daniel Watson

Well, this depends on what do you mean by a “learnable” function. Since you don’t provide any definition, I’ll answer the question in regards to universal approximation.

Universal approximation only holds for functions in compact domains, so neural nets don’t have this property for any differentiable function (e.g. a function whose domain is on R^n, but they do on any compact subsets of R^n, which in a practical way just means that the inputs are in a bounded range). Strengthening your hypothesis, this property holds for all such continuous functions, which happens to be necessary for them to b

Well, this depends on what do you mean by a “learnable” function. Since you don’t provide any definition, I’ll answer the question in regards to universal approximation.

Universal approximation only holds for functions in compact domains, so neural nets don’t have this property for any differentiable function (e.g. a function whose domain is on R^n, but they do on any compact subsets of R^n, which in a practical way just means that the inputs are in a bounded range). Strengthening your hypothesis, this property holds for all such continuous functions, which happens to be necessary for them to be differentiable but not vice versa.

Profile photo for Ygor Rebouças

A universal approximator is something that can be quite close to anything given enough resources. For instance, given a function F, a neural network can approximate F up to some maximum error (say, 0.05) given enough neurons.

That doesn’t means that much to be honest. Decision trees are universal approximators as well. Given enough nodes, it can model anything up to some error threshold.

Profile photo for Amira Al-Sharkawy

in 2012. Alex Krizhvesky found that ReLU function make the training six times faster than when using the traditional function like tanh as in the following figure. ReLU function represented by the solid line) while tanh represented by the dashed line.

in 2012. Alex Krizhvesky found that ReLU function make the training six times faster than when using the traditional function like tanh as in the following figure. ReLU function represented by the solid line) while tanh represented by the dashed line.

Profile photo for Dr. S. Pradeep

Recurrent Neural Networks (RNNs) are a class of neural networks that are particularly powerful for processing sequential data, and there is theoretical evidence to suggest that they can act as universal approximators. This concept stems from their ability to model any dynamic system given a sufficient number of hidden units and proper weights. Siegelmann and Sontag (1995) provided a seminal contribution to this field by showing that a certain class of RNNs with sigmoid activation functions can simulate Turing machines, thus demonstrating their capability to approximate any computable function

Recurrent Neural Networks (RNNs) are a class of neural networks that are particularly powerful for processing sequential data, and there is theoretical evidence to suggest that they can act as universal approximators. This concept stems from their ability to model any dynamic system given a sufficient number of hidden units and proper weights. Siegelmann and Sontag (1995) provided a seminal contribution to this field by showing that a certain class of RNNs with sigmoid activation functions can simulate Turing machines, thus demonstrating their capability to approximate any computable function to an arbitrary degree of accuracy. This result hinges on the network's recursive nature, which allows it to maintain a form of memory by processing inputs through time, making it theoretically capable of capturing complex dynamics in data sequences. While the universal approximation property of RNNs is theoretically compelling, it's important to note that in practice, the ability to train these networks to realize such potential can be challenging due to issues like vanishing and exploding gradients. Nevertheless, advancements in architecture designs, like Long Short-Term Memory (LSTM) networks, have made significant strides in mitigating these problems, further enhancing the practical applicability of RNNs as universal approximators for sequential data.

Profile photo for John Light

Overfitting, per se, is a problem of what we call machine learning (including deep learning), not of neural networks generally. It results from the artificial (!) requirement that a single kernel separate two vast populations in a multidimensional space. This process of applying kernels to high-D spaces is just the current attempt to simulate thinking with math. It’s a powerful tool for solving all sorts of computer problems, but it’s not artificial intelligence by any means.

I can “prove” my assertion by pointing at a variety of biological brains such as our own, all of which are examples of n

Overfitting, per se, is a problem of what we call machine learning (including deep learning), not of neural networks generally. It results from the artificial (!) requirement that a single kernel separate two vast populations in a multidimensional space. This process of applying kernels to high-D spaces is just the current attempt to simulate thinking with math. It’s a powerful tool for solving all sorts of computer problems, but it’s not artificial intelligence by any means.

I can “prove” my assertion by pointing at a variety of biological brains such as our own, all of which are examples of neural networks, which don’t obviously exhibit overfitting. They all exhibit failures that have the same effect as overfitting, but reduced so much in magnitude and more easily repaired that it isn’t the problem we see in our current machine learning algorithm. In biological systems we refer to such failures as false-positives and false negatives, and aren’t nearly as much of a problem as overfitting in machine learning.

I’m including this Quora AI image because it represents the fundamental issue of dealing with vast multidimensional spaces.

Neural networks make mistakes all the time, so that’s not really the issue. The issue is the cost of a mistake. Mistakes in the choice of a kernel for a machine learning system, especially a deep learning system, are extremely costly, for two reasons. First, the training of a ML/DL system, which involves the construction of one or many kernels, is expensive, so the kernel failure represents an expensive failure. Second, as overfitting/underfitting problems are discovered, the only remedy is to repeat the expensive learning process after a potentially expensive study of the first kernel’s faults. Third, one expensive remedial process is never enough, so it is likely that the remediation will have to be repeated multiple times as faults are discovered. Fourth, it is often unlikely that the remediation cycles will converge on a stable solution, as new failures are found, and the kernel undergoes reconstructions which generate new faults.

This necessity of repeated expensive kernel reconstruction is why machine learning is eating ever increasing amounts of cost, energy consumption, and carbon release. In other words, the current machine learning paradigm for simulating artificial intelligence is doomed to lose its fight with the law of diminishing returns as it consumes ever larger portions of mankind’s energy reserves

There may be many solutions to this conundrum, but an obvious one to pursue is the understanding of how biological brains work. I am following the work of Karl Friston, who proposed the free energy principle of the brain, at VERSES, a new AI company using his principle.

Profile photo for Srihitha Trovagunta

Here i am considering you just need a basic definition to understand what exactly a neural network is.

It’s often referred to as the artificial neural networks.Technically a neural network is a computer architecture in which there are many interconnected processing elements similar to that of neurons in a human brain.

It is organised in the form layers called input, output and hidden layers which are connected by nodes.There can be one or more hidden layers through which the input layer sends the responses to the output layer.
A hidden layer basically is the bridge or the link between the I/O l

Here i am considering you just need a basic definition to understand what exactly a neural network is.

It’s often referred to as the artificial neural networks.Technically a neural network is a computer architecture in which there are many interconnected processing elements similar to that of neurons in a human brain.

It is organised in the form layers called input, output and hidden layers which are connected by nodes.There can be one or more hidden layers through which the input layer sends the responses to the output layer.
A hidden layer basically is the bridge or the link between the I/O layers.

Every node contains something called as an Activation function which helps in determining an output. A neuron or a node of a particular layer has the same type of activation function.

Profile photo for Yoshua Bengio

I don't think they are. In fact I suspect it might be the other way around, and it would not be because of the sigmoid but because of the half-planes vs radial decision surfaces.

Profile photo for Kevin Swingler

No. The standard Hopfield network, with connections between every pair of neurons is not a universal approximator. They can only represent energy functions with second order interactions - as the weights only connect pairs of neurons.

If we restrict ourselves to binary variables (each variable is in {-1,1}) then adding higher order connections to a Hopfield network (i.e. weights that connect neurons in groups of 3 or 4 or higher …) will produce a universal model. In fact, it becomes equivalent to a Walsh basis set, meaning it can represent any real valued function of binary inputs.

Here is a pap

No. The standard Hopfield network, with connections between every pair of neurons is not a universal approximator. They can only represent energy functions with second order interactions - as the weights only connect pairs of neurons.

If we restrict ourselves to binary variables (each variable is in {-1,1}) then adding higher order connections to a Hopfield network (i.e. weights that connect neurons in groups of 3 or 4 or higher …) will produce a universal model. In fact, it becomes equivalent to a Walsh basis set, meaning it can represent any real valued function of binary inputs.

Here is a paper about such networks if you are interested: A comparison of learning rules for Mixed Order Hyper Networks

Profile photo for Robert A. Levinson, Phd

Certainly! While neural networks have proven to be highly effective in various domains, it is true that they can sometimes suffer from overfitting issues. However, there are alternative methods that can achieve comparable accuracy while mitigating overfitting problems. Two prominent approaches are regularization techniques and ensemble methods.

Regularization techniques, such as L1 and L2 regularization, aim to prevent overfitting by adding a penalty term to the loss function. This penalty term discourages the model from assigning excessive importance to certain features or parameters, thus pro

Certainly! While neural networks have proven to be highly effective in various domains, it is true that they can sometimes suffer from overfitting issues. However, there are alternative methods that can achieve comparable accuracy while mitigating overfitting problems. Two prominent approaches are regularization techniques and ensemble methods.

Regularization techniques, such as L1 and L2 regularization, aim to prevent overfitting by adding a penalty term to the loss function. This penalty term discourages the model from assigning excessive importance to certain features or parameters, thus promoting more generalizable solutions. Regularization techniques effectively reduce overfitting by imposing constraints on the model's complexity, striking a balance between accuracy and generalization.

Ensemble methods offer another solution to combat overfitting. These methods involve combining multiple models, often referred to as base or weak models, to create a more robust and accurate ensemble model. By leveraging the diversity of these individual models, ensemble methods can effectively reduce overfitting. Techniques like bagging, boosting, and stacking are widely used in ensemble learning, providing a way to achieve comparable accuracy while minimizing overfitting risks.

Additionally, decision trees and random forests are alternative methods that can offer comparable accuracy with reduced overfitting. Decision trees are intuitive and easy to interpret models that can handle both categorical and numerical data. Random forests, which are ensembles of decision trees, further enhance accuracy by aggregating predictions from multiple trees while reducing overfitting. The randomness introduced during the training process helps to generalize the model's performance.

In conclusion, while neural networks are powerful, alternative methods such as regularization techniques, ensemble methods, decision trees, and random forests can provide comparable accuracy while addressing overfitting concerns. It is essential to carefully consider the specific problem at hand and explore these alternatives to find the most suitable approach for achieving accurate and robust results. Good Luck!

Profile photo for Rehan Jutt

In the realm of theoretical computer science, the universal Turing machine is indeed a fundamental concept. It serves as a theoretical model for computation, capable of simulating any other Turing machine and thus embodying the concept of universality in computation. However, there are other models and variations of Turing machines that have been proposed and studied over time. Here are a few:

  1. Probabilistic Turing Machines: These are Turing machines that incorporate randomness into their operations. Instead of deterministic transitions, probabilistic Turing machines have transition probabilitie

In the realm of theoretical computer science, the universal Turing machine is indeed a fundamental concept. It serves as a theoretical model for computation, capable of simulating any other Turing machine and thus embodying the concept of universality in computation. However, there are other models and variations of Turing machines that have been proposed and studied over time. Here are a few:

  1. Probabilistic Turing Machines: These are Turing machines that incorporate randomness into their operations. Instead of deterministic transitions, probabilistic Turing machines have transition probabilities associated with each possible move.
  2. Quantum Turing Machines: Quantum Turing machines extend classical Turing machines to incorporate principles from quantum mechanics. They operate on quantum bits (qubits) and have quantum gates as part of their operations, allowing for potential exponential speedup over classical computation in some cases.
  3. Multi-Tape Turing Machines: In addition to the single-tape Turing machine model proposed by Alan Turing, there are variations with multiple tapes. Multi-tape Turing machines have several tapes, each with its own head, allowing for potentially more efficient computation for certain tasks.
  4. Non-deterministic Turing Machines: These machines can make non-deterministic choices at each step. They can explore all possible choices simultaneously, akin to a tree search, and accept if at least one of the branches leads to an accepting state.
  5. Parallel and Distributed Turing Machines: These models extend the Turing machine concept to parallel and distributed systems. They allow multiple processors to work simultaneously, communicating and coordinating their actions to perform computation.

While the universal Turing machine is a cornerstone of theoretical computer science and computation theory, these variations and extensions offer insights into different aspects of computation. The choice of model often depends on the specific problem being studied or the characteristics of the computing environment being considered. Each model has its own set of strengths and weaknesses, making them suitable for different types of analysis and applications.

Profile photo for Quora User

To date, there are over two hundred different approaches to neural networks. Many of these approaches to neural networks require the skilled operator to manually do pruning of the connectivity matrices (synaptic pruning) in software programs. In hardware versions of neural networks, the synaptic pruning that is manually done is either done before the software is converted to hardware or at the time of the hardware design itself. Robotronics LLC | Facebook

Profile photo for Ritesh Sharma

Neural Networks are basically some sort of simulation of human brain.

Computer technology has some very good skills and that is to see and listen

because we have made them this way.

Our brain works because of the neurons present and these have some sort of connections between each other.

And that is how we make decisions in our day to day life.

That is first we see what thing it is lets represent this as ‘x’

and the weightage it has in order to make a decision that is ‘w’

while coming the decision making a computer and a machine differ a lot

as we need to keep some bias conditions and also the whole p

Neural Networks are basically some sort of simulation of human brain.

Computer technology has some very good skills and that is to see and listen

because we have made them this way.

Our brain works because of the neurons present and these have some sort of connections between each other.

And that is how we make decisions in our day to day life.

That is first we see what thing it is lets represent this as ‘x’

and the weightage it has in order to make a decision that is ‘w’

while coming the decision making a computer and a machine differ a lot

as we need to keep some bias conditions and also the whole prediction lies between 0 and 1 so if we can guess it right its he sigmoid function as this gives certain values between 0 and 1 only.

So lets say we have accumulated our formula of a really simple neural network with only one neuron

That is

neuron=sigmoid(w*x-bias)

and this is where everything becomes difficult.

The idea of perceptrons and convolutions also are very effective.

The parameter which changes this whole story of making this whole thing difficult are the hidden layer of neural network.

Profile photo for Quora User

Sure, if the NN is made properly. A single unit single layer net can only output linear or basic nonlinear function, whereas more layers will allow better complex approximations.

Profile photo for Alex Seewald

One paper addressing the problem of pruning neurons in a principled way that I find interesting is the 2016 ICLR paper Diversity Networks. They select which neurons to prune at any particular layer by making a kernel of activations of the neurons and sampling a subset of neurons to keep with a so-called Determinantal Point Process. They choose this sampling strategy because DDPs produce sampled subsets with probability proportional to the determinant of their kernel (which is intuitively related to the idea of diversity of neural activations if you spend some time thinking about it, e.g a cons

One paper addressing the problem of pruning neurons in a principled way that I find interesting is the 2016 ICLR paper Diversity Networks. They select which neurons to prune at any particular layer by making a kernel of activations of the neurons and sampling a subset of neurons to keep with a so-called Determinantal Point Process. They choose this sampling strategy because DDPs produce sampled subsets with probability proportional to the determinant of their kernel (which is intuitively related to the idea of diversity of neural activations if you spend some time thinking about it, e.g a constant matrix has a zero determinant). Along with the intersesting motivation, they experimentally show this working better than random pruning (especially when sampled neurons are together optimized to fit the previous output of the non-pruned layer).

This method is not specific to a particular architecture, so it's a bit of a mismatch to your question but this is maybe the sort of thing you wished to learn about.

Profile photo for Matúš Pikuliak

Yes. Training neural network is basically a optimization task. You want to optimize the parameters of neural network so the loss function you defined is minimal. There is a plethora of optimization algorithms, but most of them can’t really deal with the amount of parameters in common neural networks (millions of them). The simplicity and speed of backpropagation + stochastic gradient descent combination made them the most used training method for neural networks.

However recently some researchers started to experiment with evolutionary algorithms and they were able to obtain interesting results

Yes. Training neural network is basically a optimization task. You want to optimize the parameters of neural network so the loss function you defined is minimal. There is a plethora of optimization algorithms, but most of them can’t really deal with the amount of parameters in common neural networks (millions of them). The simplicity and speed of backpropagation + stochastic gradient descent combination made them the most used training method for neural networks.

However recently some researchers started to experiment with evolutionary algorithms and they were able to obtain interesting results. You can check this paper for more details:

http://eplex.cs.ucf.edu/papers/morse_gecco16.pdf

Profile photo for Varun Agrawal

One algorithm which my advisor loves to use is called “Graduate Student Descent” which is basically, given a problem and a grad student, the student will try all possible hyperparameters and get the solution to the problem.

Activation layers are a type of hyperparameter, and you’ll need to experiment with all of them in order to find which works best for you. You can narrow your search by referring to prior work in the field for your particular problem. For example, it has already been shown that tanh activations work better for image classification while leaky ReLUs work better for temporal se

One algorithm which my advisor loves to use is called “Graduate Student Descent” which is basically, given a problem and a grad student, the student will try all possible hyperparameters and get the solution to the problem.

Activation layers are a type of hyperparameter, and you’ll need to experiment with all of them in order to find which works best for you. You can narrow your search by referring to prior work in the field for your particular problem. For example, it has already been shown that tanh activations work better for image classification while leaky ReLUs work better for temporal sequences such as video.

Profile photo for Kandasamy Illanko

Here is the answer for non-scientists:

Suppose there is a set of factors that affect an outcome.

The factors and the output can be quantized, sometimes even when they are subjective. An extreme example would be the pain scale.

Now imagine you have lots of information from the past regarding separate instances of these factors and the outcome they produced. Suppose you wish to use this knowledge to predict the outcome at a particular instance in the future, given the factors at that instance.

Traditionally, you would have called a mathematician or a statistician. They would have used a mathematical

Here is the answer for non-scientists:

Suppose there is a set of factors that affect an outcome.

The factors and the output can be quantized, sometimes even when they are subjective. An extreme example would be the pain scale.

Now imagine you have lots of information from the past regarding separate instances of these factors and the outcome they produced. Suppose you wish to use this knowledge to predict the outcome at a particular instance in the future, given the factors at that instance.

Traditionally, you would have called a mathematician or a statistician. They would have used a mathematical method that used the quantized data to come up with a mathematical formula for the outcome as a function of the factors.

Unfortunately, the mathematical methods available for this purpose have severe limitations. The methods can be used only when we know what elementary mathematical functions are involved in the formula, beforehand. That is not always possible. We have accumulated a vast amount of information in specific fields where we have no clue what elementary functions might be involved in the relationships. Skip the next two paragraphs if seeing x brings back painful memories.

For example, we need to know if the formula involves x, x^2, or x^3. Often it is worse than this. The formula could involve a fractional power of the x, or logarithm of x. It could involve a derivative of x.

If we know x^2 is involved, mathematics can help us determine what multiple of it is involved, like 3x^2 or 5x^2. Under certain conditions, we can determine what power of x is there, but this is very rare. This is the extent to which mathematics can help us with this particular problem.

The thing to do these days is to consult a neural network and deep learning expert.

The remarkable thing about a neural network is, that no matter how complicated the underlying relationship between the factors and the outcome, as long as there is a dependency between the outcome and the factors, we can use it for future predictions, without any worry about the mathematics involved.

The information from the past is used to train the neural network. A neural network is a computer program that remains flexible at the beginning. We give one instance of the factors to the program and ask the program the prediction for the outcome. If the prediction does not match the outcome in the information from the past (for that particular instance of the factors), we give the program a gentle nudge. Something inside the program changes.

If the prediction matches the observed outcome from the past, we do not do anything. We certainly do not give the neural network program a sticker!

Now we move on to another instance of the past information and repeat the procedure. When about half the past information is used up, we stop the training.

We now use the other half of the information to test if the neural network can make reliable predictions. This is called testing. Suppose for about 95% of the remaining instances of the factors, the program’s predictions match the past information; wouldn’t you be happy? You will have confidence in the neural network (and the training), and you will deploy it for future application.

Profile photo for Alfred Dominic Vella

There are many learning algorithms besides neural networks. In fact not all neural networks are actually networks of neurons but because of marketing of ideas to funding bodies there is a tendency to call things whatever the latest fad is.

Machine learning, and neural networks, go back to the 1950s but we have not yet got a settled set of good algorithms.

You can get an overview from

https://www.coursera.org/specializations/machine-learning

A Tour of Machine Learning Algorithms .

Essentials of Machine Learning Algorithms (with Python and R Codes)

I have personally tried many algorithms including n

There are many learning algorithms besides neural networks. In fact not all neural networks are actually networks of neurons but because of marketing of ideas to funding bodies there is a tendency to call things whatever the latest fad is.

Machine learning, and neural networks, go back to the 1950s but we have not yet got a settled set of good algorithms.

You can get an overview from

https://www.coursera.org/specializations/machine-learning

A Tour of Machine Learning Algorithms .

Essentials of Machine Learning Algorithms (with Python and R Codes)

I have personally tried many algorithms including neural networks, genetic algorithms, ant colony optimisation, regression and decision trees.

The first that I tried was based on menace (see below) and is still one of my favourites. I used a computer but you do not need to;)

I have also examined many PhD theses on ML too and each makes an improvement on what came before.

We are, however, still a long way from understanding learning, machine and human.

Menace: the Machine Educable Noughts And Crosses Engine - Chalkdust

Profile photo for Dawn Naugle

Yes, there is evidence to suggest that recurrent neural networks are universal approximators. This means that they have the ability to approximate any continuous function with a certain degree of accuracy.

About · Careers · Privacy · Terms · Contact · Languages · Your Ad Choices · Press ·
© Quora, Inc. 2025