Data science packages

Hey:D

So I already installed Miniconda3 and Python with it in Linux.

Now as part of my course , I need to download the next DATA SCIENCE PACKAGES:

  • numpy
  • scipy
  • matplotlib
  • statsmodels
  • pandas
  • seaborn

In the Installing Python for Data Analysis they talk about which version of python to download, difference between Anaconda and Miniconda and their package manager conda. Theirs a link Getting started with conda that talks about the manager conda and in that link they tell you that the default environment in the Anaconda/Miniconda terminal is (base). They also mention that “You don’t want to put programs into your base environment, though. Create separate environments to keep your programs isolated from each other.”

Does that mean that i need to create a new environment for each package or can I create one environment called for example “yummy” and save all the packages there or save everything within “base” environment?

what are your recommendations

thanks in advance :smiley:

1 Like

For those particular packages you could probably stick them all in the same environment without much issue as, so far as I’m aware, they’re all designed to work together. If you were starting a new project on something entirely different then that is a good time to create a new environment.

Separate environments are ideal to avoid incompatibilities between certain packages and to avoid issues with constant version changes. Since they’re relatively easy to create and manage you can produce a great many of them without much issue. At the end of the day it’s up to you to decide when a new one would be a good idea. I’d guess that many projects on the same path have almost identical package requirements so you could potentially get away with just the one. It could be good practice to try and create new ones, especially for the bigger projects if you wanted to get the hang of it. It is worth knowing how to use them.

It’s possible you’d be unable to have the most recent version for each and every one of those packages of them when they’re installed together. Because some of them are dependencies they may require a specific version of the other, but that’s rarely a problem. More recent versions are generally just additional features you’re unlikely to need. It’d be easier to just keep the versions you began with to avoid versioning issues whilst working on a single project and only ever update if you need to.

4 Likes

Thank you very much for your advice :smiley:

1 Like

@dev0611390716,

To expand on @tgrtim’s answer, having everything in your base conda environment is less concerning than having no separate environments when you are using Python that was not downloaded via Anaconda or Miniconda. So long as you are downloading your packages to your base conda environment, you can be sure they are already separated from your operating system’s Python version, which is the main concern.

However, once you are accustomed to using conda and you start moving on to more complex projects, I would highly recommend getting in the habit of creating a new environment for each project. This is because, in addition to avoiding incompatibilities, environments are used to ensure reproducibility — both for yourself and for others that you share your work with.

If you download all the packages you’ve ever needed into the same environment, it will be harder to know which packages and versions you used for a specific project. This makes it more difficult for people to reproduce your work, for you to share your code and collaborate, and for you to continue at a later date. Things are further complicated when you start updating package versions. Sometimes you might do this manually, and sometimes downloading one package will require a newer (or older) version of a package that you already have. Once in a while, these updates may break your code for a certain project, and tracking down the issue when that occurs is harder than avoiding it in the first place by having a dedicated environment for your project.

So, the moral of my story is: it is fine to just use the base conda environment when you first start, but as you gain experience and begin working on more complicated projects, using dedicated environments is the smart choice.

2 Likes

thanks for your input :slight_smile: