What are the design patterns for data mining/machine learning projects?

Question

Anonymous · Accepted Answer

A data science design pattern is very much like a software design pattern or enterprise-architecture design pattern. It is a reusable computational pattern applicable to a set of data science problems having a common structure, and representing a best practice for handling such problems. This page lists our data science design pattern blog posts, most recent first.

Data science design patterns generally mix several computational and you can study a design pattern thoroughly before applying it. In some cases whole books have been written about a single design pattern.

1.Combining Source Variable

Data science design patterns generally mix several computational and you can study a design pattern thoroughly before applying it. In some cases whole books have been written about a single design pattern.

1.Combining Source Variable -

Variable selection is perhaps the most challenging activity in the data science lifecycle. The phrase is something of a misnomer, unless we recognize that mathematically speaking we’re selecting variables from the set of all possible variables—not just the raw source variables currently available from a given data source.[i] Among these possible variables are many combinations of source variables. When a combination of source variables turns out to be an important variable in its own right, we sometimes say that the source variables interact, or that one variable mediates another. We coin the phrase synthetic variable to mean an independent variable that is a function of several source variables.

2. Handling null values -

There are many techniques for handling nulls. Which techniques are appropriate for a given variable can depend strongly on the algorithms you intend to use, as well as statistical patterns in the raw data—in particular, the missing values’missingness, the randomness of the locations of the missing values.[i] [ http://googleweblight.com/?lite_url=http://www.mosaicdatascience.com/blogs/data-science-design-pattern-3-handling-null-values/%23_edn1&lc=en-IN&s=1&m=944&ts=1449314437&sig=ALL1Aj6GHRzTzXgzVfmFOZfHIjysoObwXg ] Moreover, different techniques may be appropriate for different variables, in a given data set. Sometimes it is useful to apply several techniques to a single variable. Finally, note that corrupt values are generally treated as nulls.

3. Variable width kernel smoothing -

A fundamental problem in applied statistics is estimating a probability mass function (PMF) or probability density function (PDF) from a set of independent, identically distributed observations. When one is reasonably confident that a PMF or PDF belongs to a family of distributions having closed form, one can estimate the form’s parameters using frequentist techniques such as maximum likelihood estimation, or Bayesian techniques such as acceptance-rejection sampling.

4.Decision Templates -

Recall that a probability density function (PDF) assigns probability mass (relative likelihood) to measurable collections of events over a sample space.[i] A PDF distance function or metric is a distance function on some set of PDFs. For example, consider the set of geometric PDFs. The geometric PDF is defined by the probability of initial success

p(first success on kth trial) = (1 – p)kp.

The distance between two geometric PDFs having respective per-trial success rates p

and p

is

d(p1, p2) = |p2 – p1|.

(Trivially, p

and p

are real numbers, and absolute value is a distance function on the reals; so d(p

, p

) is also a distance function.)

The concept of an equivalence class arises when one partitions a set S into subsets termedequivalence classes that are pairwise disjoint and collectively exhaustive, so that every element of S is a member of exactly one subset. All elements in a given class are equivalent.

ChatGPT · Answer

Design patterns in data mining and machine learning projects help structure workflows and solve common problems effectively. Here are some widely recognized patterns:
1. Data Preparation Patterns
 * Data Collection: Gathering data from various sources (APIs, databases, web scraping).
 * Data Cleaning: Handling missing values, outliers, and inconsistencies.
 * Data Transformation: Normalization, scaling, and encoding categorical variables.
2. Modeling Patterns
 * Train/Test Split: Dividing the dataset into training and testing sets to evaluate model performance.
 * Cross-Validation: Using techniques like k-fold cross-validation to ensure the model's robustness.
 * Ensemble Methods: Combining multiple models (e.g., bagging, boosting) to improve predictions.
3. Evaluation Patterns
 * Metrics Selection: Choosing appropriate metrics (accuracy, precision, recall, F1-score, AUC-ROC) based on the problem type (classification, regression).
 * Error Analysis: Systematically examining model errors to identify areas for improvement.
4. Deployment Patterns
 * Model Serving: Building APIs or microservices to serve predictions from trained models.
 * Versioning: Managing different versions of models and datasets for reproducibility and rollback.
 * Monitoring: Implementing logging and monitoring to track model performance in production.
5. Feedback Loops
 * Active Learning: Iteratively training models using new data points that are uncertain or misclassified.
 * Retraining: Periodically updating models with new data to maintain performance over time.
6. Data Pipeline Patterns
 * Batch Processing: Handling large volumes of data in batches for processing (e.g., ETL processes).
 * Stream Processing: Real-time data processing for continuous input (e.g., using tools like Apache Kafka or Spark Streaming).
7. Experimentation Patterns
 * A/B Testing: Comparing two versions of a model or system to determine which performs better.
 * Hyperparameter Tuning: Systematically searching for the best hyperparameters using techniques like grid search or Bayesian optimization.
8. Documentation and Collaboration
 * Documentation: Maintaining clear documentation for data sources, model decisions, and workflows.
 * Version Control: Using systems like Git for managing code and experiment versions collaboratively.
Summary
These patterns provide a structured approach to tackle challenges in data mining and machine learning projects. Adopting these patterns can lead to more efficient, reproducible, and scalable solutions.

Anonymous · Answer

The beauty of machine learning is that in almost any area, you should be able to find a problem where it would be interesting to try to apply machine learning.  Recent years's course projects from Andrew Ng's CS229 class at Stanford are a good example of this.  There is a lot of breadth:
 * http://cs229.stanford.edu/projects2010.html
 * http://cs229.stanford.edu/projects2011.html

So it really depends on your interests, and the best thing would be for you to design a problem that you're interested in, then start trying different approaches for solving it. Having said that, here are some places I would think about starting if I were in your shoes. (Note: if you add comments or clarifications, I will update this answer).

Cool data sets (this is just a tiny subset):
 * A huge database of pretty well-labeled images: http://www.image-net.org/
 * Data from perhaps the most popular object classification/detection/segmentation competition: http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2011/index.html#introduction
 * Try something a bit different -- predict the results of the college basketball "March Madness" tournament: http://blog.smellthedata.com/2011/03/official-2011-march-madness-predictive.html

Tools:
 * I strongly recommend downloading and playing with Theano; it will allow you to experiment with more variations of a model with much less pain: http://deeplearning.net/software/theano/

Other:
 * I've written about a related topic here, and a few people have added to the discussion in the comments: http://blog.smellthedata.com/2010/07/choosing-first-machine-learning-project.html

Damir Korenčić · Answer

I find the following patterns described in the Gang of Four book [ https://en.wikipedia.org/wiki/Design_Patterns ] 
quite useful for building object-oriented ML software.

Facade - simple client-friendly interface that hides a more complex system of objects. It is useful to think of, say, Neural Network or Topic Model as just a facade 
- interface hiding more general and well-factored functionality. 
This way all the functionality is not packed into one class of limited re-usability.

Strategy - common interface for a family of related algorithms.

Factory - interface for assembling composite objects, such as algorithms 
using more than one model and relying on data such as dictionaries, indexes …

Adapter - for handling various ML APIs in unified way, 
create an unifying interface and adapt.

Decorator - for modifying functionality, 
for example adding caching to an algorithm.

Simon Maby · Answer

I will assume that you are not talking about building software to implement machine learning algorithms from scratch but rather packaging ML libraries into a data pipeline (R, python scikit-learn, Spark...).

Some patterns :

Async processing using queues

Machine learning systems are complex and then can be quite unpredicatable in terms of latency. Depending on your data, optimums can be found more or less easily. You also want to build a system that is able to A/B test different algorithms, so latency may vary depending on its complexity (logistic reg VS random forest is an example). Finally, the more you get into very complex algorithms (NN, stacking...), the less your workload will look synchronous. Using queues such as kafka also helps you to build distributed systems to add more workers if your data is big enough.

Hidden Feedback Loop & data channels

This is for systems where your prediction has an influence on its own verification.

The hidden feeback loop is a well known phenomenon explained by the fact that if you influence an experience you can't learn from it at the same time. A classic example is the police patrol recommendation system. If you always predict that something is going to happen somewhere and send police there, they will only be able to arrest people there and not in other zones. Your predictions will then be confirmed,  your algorithm will learn from that and will be biased more and more. A solution to this phenomenon is to split your data in two or more channels : the data you learn from and the data you predict on. For the data you learn from you make a random prediction, so you don't get biased, and see its result later to learn from it. For the other set of the data, you make your best prediction to maximize ROI/you name it. You then need a system able to split your data to various algorithms (random, algorithm A, algorithm B...), and to split it on some conditions (10% of the input, balanced sampling between targets or variables...). A good way to implement that is... queues.

Hashing trick and handling novelty

The hashing trick is a VERY handy solution for machine learning architectures (it's even better than queues...imagine). Basically, when you train a machine learning algorithms and want to get predictions from it, it will wait for a defined number of variables. This number is precisely the number of variables that it has seen on the training set. What if a new variable comes in your system after an upgrade? Or a new value for a categorical variables? Hashing trick is a very robust solution to that. The idea is that you define a number of variables that will come in your system (200 000 for example), you cast every single data to an int (for example you create a string age_22 if the next line is someone 22 and hash it to an int) then you compute the modulo of that int upon 200 000. The number you get is the number of the column for your value, you put 1 in this column. On the other side you get sparse data, which may not be the best for some algorithms like Decision trees...

This is very handy, also because anyway you will have to transform your categorical variables into a matrix of N columns with N the number of possible values. For example :

N is often changing, the order of the variables may also change and so on...you don't want to manage that. A great ML framework using this trick is vowpal wabbit, which is focused on online learning : JohnLangford/vowpal_wabbit [ https://github.com/JohnLangford/vowpal_wabbit/wiki ]

(Fake) Lambda architecture

Except if you go for online learning (update the model after every example), you will have to re-train your model regularly and cross-validate its performance. On the other hand you need your model to be able to predict over real-time or simply incoming data. That means you need two layers. One where the data goes for predictions, one where it is historicized and used for regular trainings. There will be a lot of code in common, such as your processes for feature creation, data cleaning, configuration for the machine learning libraries and so on... There is great litterature on the lambda architecture, but the general idea here is to write code that is both able to run in a streaming and a batch manner.

The "fake" disclaimer is because you don't have a serving layer (and most lambda architectures people talk about are not lambda architectures... but that's for another day!)

Kappa architecture

I like kappa. It's an architecture proposed to handle the needs i just described above, but without the complexity of a real lambda architecture (writing everything twice in a streaming and a batch framework, building a serving layer that glues the time windows..). The idea is that you only build a streaming architecture and use it both for real time predictions and to replay big batches.

Questioning the Lambda Architecture [ http://radar.oreilly.com/2014/07/questioning-the-lambda-architecture.html ]

Caching joins for incoming data

A common task in machine learning is to add variables to the example you are trying to predict against. Sometimes it can be quite long or complex to calculate them, or it can be well above your latency prerequisites. Preparing these features in advance is a good pattern. Let's say you have a client action coming into your streaming system and you want to add information about his previous purchases (2 month average, maximum purchase of his city...). You can compute them in batch and upload them in a key value cache (redis, in-memory hashmap...). You simply join it with the client id when the client is coming in the streaming system.

Also, generally speaking  :

* Keep things simple, use well understood frameworks
 * Don't try to scale if you don't need to
 * Think about the functional perimeter of your application before thinking about a framework (Spark for example)
 * I prefer a well written bunch of python lines rather than a mess of big data technologies put together (kafka spark cassandra elastic search wombocombo)
 * Think about how easily your system is allowing your R&D team to put their model in production. If they code in R and you build a custom scala code for every algorithm they suggest you won't be much agile. Try to use the same frameworks for R&D and production, and if you don't, try to use standard tools to translate the models from one language to the another (PMML for example).

Prashanth Ravindran · Answer

Excellent question. Here are few that I end up using often

1. Domain specific languages for transforming data to a canonical form  that can be digested by the decisioning system
2. Oh and canonical data models, to support above
3. For batch classification and regression, a very parallel cross validation system. An ideal use case for the on demand cloud
I will chew on this some more and add as newer things come up

Anonymous · Answer

I don't know of any, although that doesn't mean they don't exist.

There are workflows guidelines in solving a problem through the analysis of data (let's use this as the basics of solving a data science problem - there's often more that we'll ignore to keep it simple), but, since there are so many problems and so many different kinds of solutions (e.g. ML algorithms, statistical models, etc.) and even more kinds of data... they have to be taken with a grain of salt.  In any given situation, things change, and the plan has to be adapted.  Of course, you could have a workflow for all problems, but it would be so general it wouldn't tell you much.

Since much of the use of machine learning is solving data science (and related) problems, let's call "machine learning" my personal favourite part of machine learning:  research and development.  And there are ways to do this, generically speaking, methods in which to approach this.  But no design patterns.

Paul Inventado · Answer

There might be data mining processes, guidelines, or best practices (e.g., Shearer (2000), Data mining Principles and Best practice (SAS), Data mining best practices (Canadian Marketing Association), but I don't think there are a lot of design patterns on data mining out there. You might want to look into this paper, which presents 2 data mining patterns: A Pattern Based Data Mining Approach. It also presents some ideas for potential design patterns.

References:
Delibašić, B., Kirchner, K., & Ruhland, J. (2008). A pattern based data mining approach. In Data Analysis, Machine Learning and Applications (pp. 327-334). Springer Berlin Heidelberg.
Shearer, C. (2000). The CRISP-DM model: the new blueprint for data mining. Journal of data warehousing, 5(4), 13-22.

Kannappan Sirchabesan · Answer

I have been reading good things about Kaggle. You can try solving problems in some of their contests : http://www.kaggle.com/competitions

Anonymous · Answer

What are some good toy problems (can be done over a weekend by a single coder) in data science? I'm studying machine learning and statistics, and looking for something socially relevant using publicly available datasets/APIs. [ https://www.quora.com/What-are-some-good-toy-problems-can-be-done-over-a-weekend-by-a-single-coder-in-data-science-Im-studying-machine-learning-and-statistics-and-looking-for-something-socially-relevant-using-publicly-available-datasets-APIs ]

Mark Samuel Tuttle · Answer

Hard to classify this as a "pattern" per se, maybe "heuristic" would be more suitable, but "follow the data" is an important way to proceed.  That is, machine learning should help one understand where the "relevance" exists in the data, as it is unlikely in real world data that such information is uniformly distributed across all features and all data elements.  This is implicit in the list supplied by Anonymous.

Mgoussu Michel · Answer

You may have hear about the Cross Industry Standard Process for Data-mining.

It's really a simple way of thinking things, but it helps you to identify tasks and resources necessary for data mining projects.

I really think that obtaining a well expressed question from business expert is the key to a good data-mining project. There are plenty of methods and good statisticians who may be able to answer a question, once it's well explained and, of course, assuming you have the data.

Gam Dias · Answer

Lots of good answers already - however the question is such that I think perhaps a business rather than technical description might be warranted.

First things first, doing stuff with data, whatever you want to call it is going to require some investment - fortunately the entry price has come right down and you can do pretty much all of this at home with a reasonably priced machine and online access to a host of free or purchased resources. Commercial organizations have realized that there is huge value hiding in the data and are employing the techniques you ask about to realize that value. Ultimately what all of this work produces is insights, things that you may not have known otherwise. Insights are the items of information that cause a change in behavior.

Let's begin with a real world example, looking at a farm that is growing strawberries (here's a simple backgrounder The Secret Life Of California's World-Class Strawberries [ http://www.npr.org/sections/thesalt/2012/05/17/152522900/the-secret-life-of-californias-world-class-strawberries ], this High-Tech Greenhouse Yields Winter Strawberries [ http://www.laboratoryequipment.com/news/2013/12/high-tech-greenhouse-yields-winter-strawberries ] , and this Growing Strawberry Plants Commercially [ http://strawberryplants.org/2010/09/growing-strawberry-plants-commercially/ ])

What would a farmer need to consider if they are growing strawberries? The farmer will be selecting the types of plants, fertilizers, pesticides. Also looking at machinery, transportation, storage and labor. Weather, water supply and pestlience are also likely concerns. Ultimately the farmer is also investigating the market price so supply and demand and timing of the harvest (which will determine the dates to prepare the soil, to plant, to thin out the crop, to nurture and to harvest) are also concerns.

So the objective of all the data work is to create insights that will help the farmer make a set of decisions that will optimize their commercial growing operation.

Let's think about the data available to the farmer, here's a simplified breakdown:

1. Historic weather patterns

2. Plant breeding data and productivity for each strain

3. Fertilizer specifications

4. Pesticide specifications

5. Soil productivity data

6. Pest cycle data

7. Machinery cost, reliability, fault and cost data

8. Water supply data

9. Historic supply and demand data

10. Market spot price and futures data

Now to explain the definitions in context (with some made-up insights, so if you're a strawberry farmer, this might not be the best set of examples):

Big Data: Using all of the data available to provide new insights to a problem. Traditionally the farmer may have made their decisions based on only a few of the available data points, for example selecting the breeds of strawberries that had the highest yield for their soil and water table. The Big Data approach may show that the market price slightly earlier in the season is a lot higher and local weather patterns are such that a new breed variation of strawberry would do well. So the insight would be switching to a new breed would allow the farmer to take advantage of a higher prices earlier in the season, and the cost of labor, storage and transportation at that time would be slightly lower. There's another thing you might hear in the Big Data marketing hype: Volume, Velocity, Variety, Veracity - so there is a huge amount of data here, a lot of data is being generated each minute (so weather patterns, stock prices and machine sensors), and the data is liable to change at any time (e.g. a new source of social media data that is a great predictor for consumer demand),

Data Analysis: Analysis is really a heuristic activity, where scanning through all the data the analyst gains some insight. Looking at a single data set - say the one on machine reliability, I might be able to say that certain machines are expensive to purchase but have fewer general operational faults leading to less downtime and lower maintenance costs. There are other cheaper machines that are more costly in the long run. The farmer might not have enough working capital to afford the expensive machine and they would have to decide whether to purchase the cheaper machine and incur the additional maintenance costs and risk the downtime or to borrow money with the interest payment, to afford the expensive machine.

Data Analytics: Analytics is about applying a mechanical or algorithmic process to derive the insights for example running through various data sets looking for meaningful correlations between them. Looking at the weather data and pest data we see that there is a high correlation of a certain type of fungus when the humidity level reaches a certain point. The future weather projections for the next few months (during planting season) predict a low humidity level and therefore lowered risk of that fungus. For the farmer this might mean being able to plant a certain type of strawberry, higher yield, higher market price and not needing to purchase a certain fungicide.

Data Mining: this term was most widely used in the late 90's and early 00's when a business consolidated all of its data into an Enterprise Data Warehouse. All of that data was brought together to discover previously unknown trends, anomalies and correlations such as the famed 'beer and diapers' correlation (Diapers, Beer, and data science in retail [ http://canworksmart.com/diapers-beer-retail-predictive-analytics/ ]). Going back to the strawberries, assuming that our farmer was a large conglomerate like Cargill, then all of the data above would be sitting ready for analysis in the warehouse so questions such as this could be answered with relative ease: What is the best time to harvest strawberries to get the highest market price? Given certain soil conditions and rainfall patterns at a location, what are the highest yielding strawberry breeds that we should grow?

Data Science: a combination of mathematics, statistics, programming, the context of the problem being solved, ingenious ways of capturing data that may not be being captured right now plus the ability to look at things 'differently' (like this Why UPS Trucks Don't Turn Left [ http://priceonomics.com/why-ups-trucks-dont-turn-left/ ] ) and of course the significant and necessary activity of cleansing, preparing and aligning the data. So in the strawberry industry we're going to be building some models that tell us when the optimal time is to sell, which gives us the time to harvest which gives us a combination of breeds to plant at various times to maximize overall yield. We might be short of consumer demand data - so maybe we figure out that when strawberry recipes are published online or on television, then demand goes up - and Tweets and Instagram or Facebook likes provide an indicator of demand. Then we need to align demand data up with market price to give us the final insights and maybe to create a way to drive up demand by promoting certain social media activity.

Machine Learning: this is one of the tools used by data scientist, where a model is created that mathematically describes a certain process and its outcomes, then the model provides recommendations and monitors the results once those recommendations are implemented and uses the results to improve the model. When Google provides a set of results for the search term "strawberry" people might click on the first 3 entries and ignore the 4th one - over time, that 4th entry will not appear as high in the results because the machine is learning what users are responding to. Applied to the farm, when the system creates recommendations for which breeds of strawberry to plant, and collects the results on the yeilds for each berry under various soil and weather conditions, machine learning will allow it to build a model that can make a better set of recommendations for the next growing season.

I am adding this next one because there seems to be some popular misconceptions as to what this means. My belief is that 'predictive' is much overused and hyped.

Predictive Analytics: Creating a quantitative model that allows an outcome to be predicted based on as much historical information as can be gathered. In this input data, there will be multiple variables to consider, some of which may be significant and others less significant in determining the outcome. The predictive model determines what signals in the data can be used to make an accurate prediction. The models become useful if there are certain variables than can be changed that will increase chances of a desired outcome. So what might be useful for our strawberry farmer to want to predict? Let's go back to the commercial strawberry grower who is selling product to grocery retailers and food manufacturers - the supply deals are in tens and hundreds of thousands of dollars and there is a large salesforce. How can they predict whether a deal is likely to close or not? To begin with, they could look at the history of that company and the quantities and frequencies of produce purchased over time, the most recent purchases being stronger indicators. They could then look at the salesperson's history of selling that product to those types of companies. Those are the obvious indicators. Less obvious ones would be the what competing growers are also bidding for the contract, perhaps certain competitors always win because they always undercut. How many visits the rep has paid to the prospective client over the year, how many emails and phone calls. How many product complaints has the prospective client made regarding product quality? Have all our deliveries been the correct quantity, delivered on time? All of these variables may contribute to the next deal being closed. If there is enough historical data, we can build a model that will predict that a deal will close or not. We can use a sample of the historic data set aside to test if the model works. If we are confident, then we can use it to predict the next deal

[Update June 19, 2017 - just discovered: Farmers Business Network (FBN) [ https://www.farmersbusinessnetwork.com/ ] Farmers Business Network is proudly Farmers First SM. Created by farmers for farmers, FBN is an independent and unbiased farmer-to-farmer network of thousands of American farms. FBN democratizes farm information by making the power of anonymous aggregated analytics available to all FBN members. The FBN Network helps level the playing field for independent farmers with unbiased information, profit enhancing farm analysis, and network buying power.]

Remil Work · Answer

1. MapReduce - A programming model for processing large data sets in parallel across a cluster of computers.
2. Lambda Architecture - A design pattern for building big data systems that balances the need for low latency and high throughput.
3. Data Partitioning - The process of dividing a large data set into smaller, manageable pieces for parallel processing.
4. Distributed File Systems - Used to store and manage large amounts of data across multiple nodes in a distributed manner.
5. Stream Processing - A design pattern for processing data in real-time as it is generated, rather than in batch mode.
6. CQRS (Command Query Responsibility Segregation) - A pattern that separates the responsibilities of reading and writing data in a big data system.
7. Microservices Architecture - A design pattern that breaks down a monolithic application into smaller, independent services that communicate through APIs.
8. Materialized Views - A precomputed data structure that stores the results of a query for fast and efficient retrieval.

Himanshu Dhiman · Answer

well there are various subtypes but the main types of machine learning models for data analysis and data mining are :

* for data mining
 * 
 * Predictive
 * Descriptive

* for data analysis
 * 
 * Statistical modeling

every kind of modeling that is related to data analysis and data mining is of one of the above types .

hope this helps ——————————————

follow data science newbies for more answers ——————-

https://datasciencenewbies.quora.com/
please upvote the answers —————————

Harold Serrano · Answer

I tend to use these design patterns in the game engine I'm developing:

* Singleton Design Pattern
 * Strategy Design Pattern
 * Observer Design Pattern
 * Composite Design Pattern
 * Model-View-Controller Design Pattern
Here is a brief explanation of each pattern:

Singleton Design Pattern

In a game engine, just like in a movie, there should be only one director. A director is a class that conducts everything that happens in a game. It controls the rendering of an object. It controls position updates. It directs the player’s input to the correct game character, etc.

The engine should prevent more than one instance of a director to be created, and it does so through the Singleton Design Pattern. This design pattern ensures that one and only one object is instantiated for a given class.

Strategy Design Pattern

In a game, you should always decouple the interaction between the input controller and the game's logic. The game's logic should receive the same kind of input regardless of the input controller (button, gesture, joystick).

Although each input controller behaves differently to the user, they must provide the same data to the game's logic. Furthermore, adding or removing an input controller should not crash a game.

This decoupling behavior and flexibility are possible thanks to a design pattern known as Strategy Design Pattern. This design pattern provides versatility to your game by allowing it to change behavior dynamically without the need of modifying the game's logic.

Observer Design Pattern

In a game, all of your classes should be loosely coupled. It means that your Classes should be able to interact with each other but have little knowledge of each other. Making your Classes loosely coupled makes your game modular and flexible to add features without adding unintended bugs.The Observer Design Pattern provides such functionality.

The Observer pattern is implemented when an object wants to send messages to its subscriber (other objects). The object does not need to know anything about how the subscribers work, just that they can communicate.

Composite Design Pattern

A game typically consists of many views. There is the main view where the characters are rendered. There is a sub-view where player's points are shown. There is a sub-view which shows the time left in a game. If you are playing the game on a mobile device, then each button is a view.

Maintainability should be a significant concern during game development. Each view should not have different function names or different access points. Instead, you want to provide a unified access point to every view; the same function call should be able to access either the main view or a sub-view.

This unified access point is possible with a Composite Design Pattern. This pattern places each view in a tree-like structure, thus providing a unified access point to every view. Instead of having a different function to access each view, the same function can access any view.

Model-View-Controller Design Pattern

If the Model-View-Controller design pattern were a Rock band, then it would be called "The Beatles." No doubt about it. It is the most widely used and loved design pattern among programmers.

My introduction to Design Patterns was through the Model-View-Controller. And I wish it wouldn't have been that way. I, like many programmers, started learning this pattern without realizing that this pattern is made up of three fundamental Design Patterns. Not recognizing this fact, caused a lot of confusion.

The Model-View-Controller design pattern is made up of three patterns:

* Strategy Design Pattern
 * Observer Design Pattern
 * Composite Design Pattern
As shown in the illustration above, the Strategy pattern represents the Controller part of the MVC. The strategy pattern decouples user inputs from the game's logic (Model) and interfaces (View).

The Composite Design Pattern represents all Views (main window & buttons) in an application. This pattern provides a unified access point for all views to the model.

The Observer Design Patten represents the logic of your application (Model). Through this pattern, the Model can interact with the views and controllers without knowing anything about them. This pattern makes the interaction between all classes loosely coupled.

Maxwell Robinson · Answer

In my experience:

A certain few of the GOF design patterns show up a lot, in most cases programmers use them without consciously deciding to “use a design pattern”.

* singleton: most often this is nothing more than a global variable in OOP clothing - a static method to get. People will always need global variables.
 * iterator pattern: used in almost every standard library with containers, and that’s not counting “incremented pointers”, even though that often can provide a similar interface.
 * strategy pattern: often using some form of language-provided dynamic dispatch (virtual, or functor). Picking behavior at run time is a very powerful tool, even if at some level it just boils down to branching, switching, or looking up.
 * Factory pattern: essentially just the strategy pattern applied to object creation, usually achieved in the same ways.
 * observer pattern: nearly every game engine or other system with “tasks” will have something similar. Some forms of reference counting and concurrency even imply its use.
 * visitor pattern: the popularity of observer and strategy pattern often results in many uses of the visitor pattern: “I have a list of objects I need to inform of an event, but the objects are generic. I’ll have them all share some common interface so I can iterate over them and call the right method for each.”
The rest absolutely do show up from time to time, and some probably should show up more often than they do, but these ones have made it into almost every piece of software. Maybe the command pattern deserves an honorary mention on it’s applicability to HTTP requests and Rest APIs.

Richard Rombouts · Answer

The problems with neural networks are

* they are bad at explaining how they arrived at their solution
 * it is hard to extract that from the weights and topology of the NN.
There are of course more forms of machine learning that can do better on these points.

Without a doubt, machine learning can discover (greedy) algorithms to solve certain problems with 100% guaranteed success (given enough time and electricity etc), or can discover new strategies, heuristics or methods that (often) converge quickly and give good approximations otherwise.

This does not apply only to design patterns, but to a much wider range of problems. Any problem that can be cracked with intelligence will benefit from having more of the latter.

There are problems for which not only no good algorithms are known, but for which we think such algorithms do not exist. You can only solve these by exploring the entire solution space. Intelligence will be of little help here. Possible examples are: finding useful patterns in prime numbers or in the decimals of PI or e (the base of the natural logarithm).

Joseph Barr · Answer

The term “pattern” is rather generic.

For example you don’t have to be no scientist to discern the pattern in the sequence 100000, 010000, 001000, 000100, 000010 and 000001. In this trivial example the term pattern makes perfect sense. Sometime this trivial example generalizes. For ham/spam classification often a simple patterns like the count of certain objects [sic] patterns (sometime called “majority rule”) is an effective method to classify ham from spam.

For highly complex model, one with many degrees of freedoms (lots of parameters) a pattern is lost to the eye and an argument utilizing model abstractions in terms of algorithm, cost function, etc is an apt substitute. Often knowledge representation, encoding, feature generation is a fundamental part of the algorithm in which case the pattern is hidden behind additional levels of indirection (“a pointer to a pointer to a pointer.”)

Frankly, I’m not enamored with “patterns”. In ML I prefer just algorithm and in statistics I prefer model. Often a model is of hierarchical nature in which the plural models is apt. Now we’re in danger of falling into an infinite descent but I am no philosopher and good with our standard metaphors (as long as we know what we’re talking about.)

Aanchal Maheshwari · Answer

I had the same question on my mind exactly an year ago. One of the most difficult tasks keeping in mind your aim to publish a paper by the end of 2 months is to make a sound problem statement. A lot of groundwork goes into coming out with a reasonable and new problem statement which could be solved.

The only way to come out with a problem statement is to read voraciously about the concerned topic. You have to keep reading a lot of research papers, good blogs, solve problems from Kaggle [ http://www.kaggle.com/competitions ],etc to understand the subject better. While you are pursuing all the above mentioned tasks, you might come across a new problem or a new improved way to solve an existing problem. So, I would suggest that your sole focus should be to enhance your skills in the above subjects, the project idea will strike you sooner or later.

Though this does not answer your question directly but this is the only way to find a research project, you have to spend a lot of time researching ! 
All the best :)