Most Commonly Asked Machine Learning Interview Questions
In this article, we will talk about Machine learning and the most important questions you can expect in your interview.
Machine learning broadly can be classified into four categories:
- Classical Learning
- Neural NETS and Deep Learning
- Ensemble Methods
- Reinforcement Learning
Our article will deal with each of them and provide you the most asked questions from these domains. There are four types of genre that you can expect in your interview or any prelims & they are as follows:
- Modeling case study questions.
- Core machine learning interview questions
- Recommendation engines & search engines.
- Questions based on Python.
You must know that the degree of difficulty depends upon your applied profile or job role. For Business-oriented roles, you need to prepare for Applied Machine Learning and require experience and skills for higher Job-roles such as data scientists or research scientists.
Modeling Case study:
Tenda is a product-based company that manufactures goods like Router, modem, and Optic fiber. Its product engineers, design engineers, material engineers, and their respective teams worked hard to build a new product category in their Router product line. Their product uses multiple channels to reach their end-users like ISP, network providers, cable connection owners & e-commerce websites. A returned product gets treated as a loss in revenue & a shame. Retailers can return the product when not sold, & that also adds up to Tenda's loss. Challenges could vary as the following:
- unavailability of data
- unstructured data
- It takes months to gather data and validates its sufficiency.
Here the relevant-information of the product features with past transaction data is available with customer reviews & warranty details of the product.
Basic Machine Learning Interview Questions
Q1. How can we ascertain the volume of the returned products, followed by the reasons for return?
The data engineers have to use NLP technology like word embedding, N-grams, term frequency-inverse document, Latent Dirichlet Allocation, Support vector Machine & Long Short-term memory. For tokenization, lemmatization & parts-of-speech tagging. It will provide the reasons for the return products.
Q2. Hence the business wants a solution that can predict the failure rates to build a better product.
NLP and BI tools devised for data visualization will create a forecast model that will identify the pattern in complaints, frequency of complaints as per the product line. Test analytics does an acid test and reveals returned products and defective products. The relevant data to solve the problem of data insufficiency battled through data sampling bias.
Recommendation engines & search engines:
How would you build the recommendation algorithm engine likeType-ahead search?
Based on the origins of the data, we can create a confluence between collaborative filtering and content-based or characteristics filtering. We achieve convergence through the Hybrid approach. We can also classify the data based on the algorithm our end users use, for instance, memory base, model base. The Hybrid Approach is a mixture of these memory-based and Model-based algorithms.
Then we need to focus on collecting feedbacks explicitly and implicitly.
Then we start with a matrix of users x items using ratings to find similar users or items recommended with high similarity.
With the use of the K-Nearest Neighbors algorithm, we can find the K closest data points prediction and get the average of the neighbors.
Finally, we load the data from the database to find the users nearest the neighbors and use neighbors' ratings to predict ratings. This method is the RapidMiner Process. It helps us to train the system by doing the trials from the big chunk of the training data.
And at the end of the step, we use content-based recommenders as an ML Problem to classify items according to their contents to match the user's interests. We also profile users based on their content.
Core Machine Learning Interview Questions
How would you differentiate AI, Machine learning, and deep learning from each other?
Artificial Intelligence is the technique that enhances the computing prowess of machines to mimic human behavior, and Machine learning uses statistical methods to enable this behavior that includes Deep Learning to create this augmented reality. Deep-learning, on the other hand, uses multilayered Neural networks to compose the algorithm obtained from vast data such as speech and image recognition to simulate human decision-making or works as a positronic brain.
Illustrate the different types of Classical Learning?
There are two types of Classical Machine learning:
- Supervised
- unsupervised
These are simple tabular data with CLEAR features.
What is Bayes' theorem?
Bayes' theorem allows us to determine Posteriers Probabilities from our Priors with evidence. It is a method of revising existing predictions given in new evidence. The theorem forms the fundamental assumptions of the Naive Bayes' Classifier.
What is the difference between generative & discriminative models?
Discriminative Models learn decision boundaries between classes. SVM is discriminative as we are creating a learned decision boundary and serves as a maximum margin classifier. The discriminative models are not susceptible to Outliers efficiently but have the maximum likelihood of estimation. Here we maximize the "conditional likelihood" with given model parameters.
Generative models learn the distribution of the classes themselves. NBC is generative as they are adaptive to the distribution of the CLASSES. Here outliers are handled better than discriminative models. We maximize the joint likelihood in Generative models, otherwise called the joint probability given with the model parameter.
Define Data Normalization and its needs?
We spend so much time normalizing our data & giving our data clean and set it up. Data Normalizing is a preprocessing step to standardize our data. It can help to minimize and eliminate data redundancy by rescaling values to fit into the desired range to build convergence. And then, finally, we restructure the data & improve integrity. The need for data normalization is the input of clean data and out of clean data.
Elucidate cross-validation technique would you use on a time series dataset?
In normal cross-validation, such as K-Fold CV, we split the data into k equal size data chunks & use K -1 Chunks as Training, and remaining for the Training. We can average the performance of K tests to give some performance measures. We cannot include Samples into the CV that occurred later corresponding to the test point. The selection of a Point as test set marks demarcates the before occurrence as the train sets.
How is a decision tree pruned?
The pursuit of node purity depends on making a decision tree simple with a cost-effective method that helps to determine the functioning of the Original tree. Pruning helps in removing nodes and branches in a decision tree to achieve the highest accuracy. The Cost-effective pruning method determines the stagging of the original tree. If the validation set does not have a significant difference in its performance, then it is considered as a Simpler tree.
How do you handle an imbalanced dataset?
Working with a lot of data in the under-represented class should begin with random undersampling that involves getting rid of over-represented class samples from the training data. On the contrary, with fewer data to work with, performing random oversampling by taking the under-represented class with its replacement to achieve the required ratio. SMOTE or synthetic monitory oversampling is a technique that helps to synthesize data. The reason is aggregation tends to mitigate the over-fitting of a specific class.
Define a Boltzmann Machine?
It is a simplified version of a Mult-Layer perceptron that has a visible input layer and a hidden layer. The Boltzmann Machine is almost always shallow. They have these two-layer neural nets that make stochastic decisions, whether a neuron should on or off. Nodes forms connection across the layers but, no two nodes of the same layer are connected. Hence, it is also known as the Restricted Boltzmann Machine.
How will you differentiate between Classification and regression in ML?
The difference between Classification and regression depends upon the various types of prediction problems based on supervise and unsupervised learning. Altogether there are classification, regression, clustering & association problems. Classification helps in differentiating different types of data to separate categories based on input parameters. Regression, on the other hand, helps to create a model for distinguishing these data into real values. It can even predict the movement based on historical data. Mostly it is used in predicting the occurrence of an event depending on the degree of association of variable.
How will you define Linear Regression?
Mostly used for classification problems to predict the group to which the object in supervision belongs and, these possibilities have to be translated into binary values to get the prediction. It helps to measure the relationship between what we want to predict over the independent variable with the usage of logistic functions.
Explain how the Receiver operating characteristic curve works?
It has its roots in Signal Theory detection that is a binary classification problem. Thus it is used to measure the performance of Binary classifiers where you use it as a trade-off between True Positive and False Positive rates.
In conditions where both Sensitivity & fallout is zero, then the classifier prediction goes negative.
What is Auto ML?
It is a relatively new machine learning paradigm that governs the application of algorithms within the inputs of any type of data to get results automatically. It simplifies the work of a data scientist with preoccupied techniques.
How would you define the confusion matrix and its elements? Explain with examples.
A Confusion Matrix is a grid used to summarize the performance of classification algorithms.
Imagine we have a medical data grid that has metrics like:
- Chest Pain
- Good blood circulation
- Blocked Arteries
- Weight
We have to apply ML to predict whether or not someone can develop heart disease. There tons of methods like Logistic regression, K-Nearest neighbors, and Random Forest methods to get the results. Deciding the optimal way to coincide with our data is a critical problem. Thus, we have to divide the data into training and testing sets. We can make use of CV or cross-validation to eke out quicker and efficient results. Then we train the testing set by trailing with each method. Confusion Matrix gets us the analysis of records & methods we have devised in testing. In the Confusion Matrix:
- Rows correspond to the ML prediction of the algorithm.
- Columns correspond to the truth.
Since we have only two categories to choose from:
- Heart disease
- Does not have heart disease
Then the top left corner contains TRUE Positive. These are the patients who had heart disease and are correctly identified by the algorithm.
The TRUE Negatives are in the bottom-right corner. These are the patients who did not have heart disease.
The Bottom-left corner contains FALSE Negative. These patients had Heart disease, but the algorithm said they didn't.
Lastly, the top-right corner contains FALSE Positives. These are the patients who do not have Heart Disease, but the algorithm failed to predict it.
The only optimal metrics out of all will show a high degree of variation in its performance, unlike others who end up closer in their values. That is the reason we use sophisticated metrics like Sensitivity, Specificity, ROC, and AUC that are profound in getting accurate performance values.
Differentiate between Inductive and Deductive Learning?
We use data through a model to predict in Machine learning the inductive learning is used to conclude out of observations. Deductive learning infers the form of speculation out of the conclusion. Transductive-learning helps to create this loop of continuous inspection and framing inference.
How would you categorize using too many false positives or too many false negatives?
Taking sides depends upon the scenario, the domain where we are indulged in ML when used in the detection of spam emails, then false-positive will making important emails marked as spam.
And when ML gets used in Medical testing, then a false negative makes the paradigm risky by classifying the report excellent when things are not good with the patients.
How can you make a more accurate prediction?
Model Accuracy is a subset of Model Performance, where the accuracy and the performance of the model are directly proportional to each other. Thus the better the execution of the model gives us precise accuracy in the predictions.
Differentiate between Gini Impurity & Entropy in a Decision tree?
Gini Impurity and Entropy are the metrics that split a decision tree. To reduce the uncertainty in the output level, we make use of Entropy. It calculates the Information-gain by a split. The Gini Impurity metrics classifies a randomly picked label as per the distribution in its branch with the probability of a random sample.
Entropy determines the haziness or mess in the data decreasing our reach to get closer to the leaf node.
Illustrate the Ensemble learning technique used in ML?
No comments:
Post a Comment