What are some ways that probability in data science is used in large companies?
Probability is used for many applications in different fields and companies around the world.
One of the most common applications of data science is for advertisements. By using probability, companies determine what kinds of ads are most relevant for a user. For example, YouTube and Facebook ads.
Similarly, probability is also used for e-commerce companies like Amazon, to determine what items to present a user based on their purchase history.
The video-streaming site Netflix uses probability to match titles that you might watch, and will also change the artwork for shows to receive more clicks.
Probability is also fundamental to areas such as machine learning which is applied in applications such as Google Deepmind, AlphaGo, and self-driving cars.
What kind of recommendations to make to the user in case they have already eaten at the restaurant before or desert recommendations based on lunch choices which are optimised based on parameters such as maximizing tips, profit or/and review.
I think there is significant potential, especially for businesses in that industry that operate on a high scale with multiple stores and many data points.
Some companies will use probability in terms of sales or projection of consumer behavior based on the time of year.
To get more technical, chemical instrumentation companies will use probabalistics to determine the performance of their instrument versus a fitting of the analytical expression of the fundamental equations that describe the signal that they are assessing (i.e. raman spectroscopy simulations).
I tried to make sense of the program that runs the ‘same birth-date in a room full of people exercise’. And while I could not understand the python code, I figured out that having 83 people in the room leads to us having a ‘100%’ probability of having 2 people in the same room with a shared birth-date.
Intuitively, I had thought that a sure-fire way of having a 100% probability for 2 people in the same with a shared birth-date would be only when we have either 366 or 367 people (depending on whether the code identifies the said range to of available dates to be for a regular year or for a leap year).
Why is my intuition not aligned with the answers of the Python coded simulation? Any guidance on where I could be going wrong with my understanding?
If I’m understanding this explanation correctly, the wrinkle is because every possible combination pair of people, 1-83, is tested, which is 3,403 possible combinations, far beyond the 366 dates available. This is different from the way we inutitively think about it, which usually entails picking one person and then comparing them to the remaining people in the room. Let me know if this helps or I’ve made a mistake in the math.
No, you make perfect sense. An easier way to put it could be:
Initially, when the room only had 2, 3 or similarly smaller value of people -> ‘the pool of collected dates’ was way smaller than ‘the pool of available dates’.
But that changed exponentially post 20 or 30 or whatever since -> ‘the pool of collected dates’ began to grow to be equal to ‘the pool of available dates’.
I have tested as well that the program can have maximum 120 persons in the same room. Beyond that the program will get error. So the 100% probability is true for a range of people between 83-120. Why is it limited to 120?
In healthcare or digital health, probability may be used to determine the likelihood of symptoms flaring up like for period/fertility tracking apps or a chronic illness that triggers you to need to visit your doctor.
Part of the code calculates a denominator equal to 365 to the power of the number of people. So, 365^121, which is a massive number. The code encounters an error here because it can’t convert that massive integer into a float (line 50).
Another common way that probability is used can be for sales prediction. Based on previous sales of older products, companies can determine what the sales outcome of the next product they will release.
In my industry (semiconductor), probability is broadly used when engineers want to label wafer production process’s efficiency or capability. They use probability of good dies being yielded in one process.
In manufacturing and logistics, we use probability as part of Inventory control. We forecast the likely future usage of products or components based on historical usage. This allows us to make educated decisions about how much of a product to buy / make in the hope of having enough on hand to meet demand without wasting money on too much.
In hospital settings, we could use probability to anticipate equipment or drug usage in the case of surges and use that info to guide decisions on how much of each to have to storage for each department.
Investors use probability distribution to predict returns overtime on assets, such as securities and to hedge their risk, also Probability distributions are often commonly used in riskmanagement to measure the probability. It also measures the sum of losses that an investment portfolio will experience based on a distribution of historical returns
I’m no expert by any stretch but off-hand brainstorming you could analyze erosion patterns to calculate the probability of things like future erosion rates, effects on terrain, etc. I imagine logging companies could keep track of their ration of trees cut to trees planted to ensure they’re not causing too much deforestation. Perhaps park rangers who are keeping track of local flora and fauna could make predictions of endangered or overpopulation.