I am diving into the Data Science Career path and am thinking about all the cool libraries there are in Python for me to use. Since I am a newbie with libraries, I may be inclined to trust the robustness and capabilities of libraries more readily than others.
However, when dealing with data, especially sensitive data (credit information, medical records, etc.), how can I know if a library I download and run won’t do something malicious like steal my data, take my computer hostage, etc? I know that many people use ‘popular’ libraries, but I don’t want to just follow the herd blindly.
How do I cross examine the safety and suitability of a particular library?
Quite a few of the popular projects for data analysis are open source so you can freely access and review the source code.
The prevailing consensus with projects which are open source is that “more eyes” - more people with access to the source code - makes it harder for anyone to introduce harmful or malicious code to the project in comparison to a closed source project.
Ultimately, though, unless you’re willing to review every line of code yourself the system boils down to trust and whether you’re comfortable trusting the project maintainers, contributors and the ecosystem in general not to pull a fast one and do something dodgy.
If the library is open source - Pandas being an example of one such project - then you can freely review the source to satisfy yourself that there’s nothing untoward going on. (For absolute clarity, I’m not suggesting there’s anything untoward in Pandas!)
Otherwise, it’s a calculated decision; if a high number of people are using it with no reported issues or incidents, do you take that as a sign it’s safe?
We never know the libraries themselves, or what they offer until we deep dive into the core and try to solve the same problems they do, but make it look easy. Experienced developers helped create those libraries to abstract away a lot of the redundant and tedious operations that were required in a vanilla solution. If you don’t have the problem, you don’t need the library.
jQuery is in its death throws because there is no more Internet Explorer to have to deal with. Edge is W3C compliant, for the most part. Newer libraries have emerged for that environment, the same as newer libraries emerged with the advent of Python 3. The language change helped abstract away some minor things and it opened up a whole new vista for libraries to focus on the real nuts and bolts, not the problem, and it gives you a leg up on whether a library actually helps solve it, or just uses smoke and mirrors.