24 June 2018

It behooves me to address machine learning as a separate field from software engineering due to it having a different focus while rapidly expanding in importance. I’m not just jumping on the bandwagon either as I started with ML projects and coursework around 10 years ago.

For exploration and learning, Python is probably one of the best environments to work in due to it having some of the most important ML libraries available for it and also being highly popular in the ML community.

That being said, working with Python on macOS takes a little extra setup if you want to have a clean and manageable environment. The key idea is to avoid messing with the system Python that comes with macOS. That one should be left untouched for use by the OS because it can be changed during OS upgrades.

Instead, I recommend pyenv and pyenv-virtualenv for creating separate Python environments that can be reserved for machine learning and other purposes.

The reason to do this is because once you setup your ML libraries, you don’t want them or their dependencies to change if you need to use Python for something else. Also, it allows you to have separate versions of Python for separate purposes. Python 2.7.x is still needed for tasks like building Chromium.

I’ve recommended pyenv-virtualenv in addition to pyenv because it allows the creation of separate environments under the same version of Python whereas pyenv by itself is used for installing different versions of Python.

Please see the latest installation docs, links below, for each tool at their respective Github pages.

On macOS, you will want to have Python installed as a framework to use features like integrated plotting with matplotlib. This can be done by setting an environment variable during the installation of your desired version.

$ PYTHON_CONFIGURE_OPTS="--enable-framework" pyenv install 3.6.5

This is only needed for the base version and is unnecessary for subsequent virtual environments installed by a command like

$ pyenv virtualenv 3.6.5 python-3-for-ml

Then, switching into a virtual environment can be done with pyenv alone. For example

$ pyenv shell python-3-for-ml

You can always see what is installed using the versions argument. On my system, I have something like the following:

$ pyenv versions
* 2.7.15 (set by PYENV_VERSION environment variable)

When using pip for package management, installed packages can be listed with pip list.

For example:

$ pip list
appnope (0.1.0)
backcall (0.1.0)
bleach (2.1.3)
cycler (0.10.0)
dbgp (1.0)
decorator (4.3.0)
entrypoints (0.2.3)
html5lib (1.0.1)
ipykernel (4.8.2)
ipython (6.4.0)
ipython-genutils (0.2.0)
ipywidgets (7.2.1)
jedi (0.12.0)
Jinja2 (2.10)
jsonschema (2.6.0)
jupyter (1.0.0)
jupyter-client (5.2.3)
jupyter-console (5.2.0)
jupyter-core (4.4.0)
kiwisolver (1.0.1)
MarkupSafe (1.0)
matplotlib (2.2.2)
mistune (0.8.3)
nbconvert (5.3.1)
nbformat (4.4.0)
notebook (5.5.0)
numpy (1.14.3)
pandas (0.23.0)
pandocfilters (1.4.2)
parso (0.2.1)
pexpect (4.6.0)
pickleshare (0.7.4)
pip (9.0.3)
prompt-toolkit (1.0.15)
ptyprocess (0.5.2)
Pygments (2.2.0)
pyparsing (2.2.0)
python-dateutil (2.7.3)
pytz (2018.4)
pyzmq (17.0.0)
qtconsole (4.3.1)
scikit-learn (0.19.1)
scipy (1.1.0)
Send2Trash (1.5.0)
setuptools (39.0.1)
simplegeneric (0.8.1)
six (1.11.0)
sklearn (0.0)
terminado (0.8.1)
testpath (0.3.1)
tornado (5.0.2)
traitlets (4.3.2)
wcwidth (0.1.7)
webencodings (0.5.1)
widgetsnbextension (3.2.1)

One last note, I don’t recommend upgrading pip to v10 yet. I can’t seem to recall the exact reason right now, but I vaguely remember it was due to an incompatibility with an ML library.

blog comments powered by Disqus