1.1 Installing Python the right way¶
Almost every Python disaster in a research group can be traced to one of three root causes: using the operating system's bundled Python, installing the wrong distribution, or mixing pip and conda carelessly. This section shows you the path that avoids all three.
Why not the Python that came with my computer?¶
macOS and most Linux distributions ship with a system Python. Do not use it for science. The system Python:
- is tied to OS updates — upgrading macOS may silently change your interpreter;
- often lacks the headers needed to build scientific packages;
- requires
sudoto install anything globally, which corrupts the system if you ever pin a different version of a shared library; - is missing important compiled dependencies (BLAS, LAPACK, FFTW) or links against slow defaults.
The system Python exists to run system utilities, not your DFT pipeline.
Why not Anaconda?¶
Anaconda — the full distribution — bundles hundreds of packages you do not need and several you might not want (licensed, telemetry-enabled defaults, an enormous installer). Since 2020, Anaconda's commercial terms have also restricted its use in organisations with more than 200 employees.
We use Miniconda (or its drop-in faster sibling Mamba, distributed as Miniforge), which gives you:
- the
condapackage manager, - a small base environment,
- access to the
conda-forgechannel (community-maintained, well-curated, license-clean),
and nothing else. You install only what you need.
Recommendation
Install Miniforge3. It is Miniconda preconfigured with conda-forge as the default channel and mamba as a faster solver. One installer, no licensing concerns, no surprises.
Step-by-step installation¶
macOS (Apple Silicon and Intel)¶
Open Terminal and run:
# Pick the right arch — uname -m prints arm64 on Apple Silicon, x86_64 on Intel
ARCH=$(uname -m)
curl -L -o miniforge.sh \
"https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-${ARCH}.sh"
bash miniforge.sh -b -p "$HOME/miniforge3"
rm miniforge.sh
The -b flag accepts the licence non-interactively; -p chooses the install prefix. Now initialise your shell:
The second line restarts your shell so the new conda is on your PATH. Verify:
You should see paths under ~/miniforge3 and a Python version of 3.11 or newer.
Linux (x86_64 and aarch64)¶
ARCH=$(uname -m)
wget -O miniforge.sh \
"https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-${ARCH}.sh"
bash miniforge.sh -b -p "$HOME/miniforge3"
rm miniforge.sh
"$HOME/miniforge3/bin/conda" init bash
exec bash
If you are on a cluster, install into $HOME rather than a shared directory. Conda environments contain symlinks that break if moved, so installing to your own home directory is the path of least resistance.
Windows¶
Use the Windows Subsystem for Linux (WSL2) with Ubuntu, then follow the Linux instructions above. Native Windows Python works but is a second-class citizen for materials simulation — most DFT and MD codes assume a POSIX environment.
What's an environment, and why have many?¶
An environment is an isolated directory containing a specific Python interpreter and a specific set of packages. Different projects can live in different environments without interfering. If you upgrade numpy in one and it breaks an old script in another, the old script does not care, because its environment still has the old numpy.
You should have:
- a base environment that you never touch except to update
condaitself; - one environment per project (or per closely related cluster of projects).
Never install into base
The moment you pip install something into your base environment, you have lost the ability to reset cleanly. Treat base as read-only.
Creating the materials-simulation environment¶
Save the following as environment.yml in the root of a new project directory:
name: materials-simulation
channels:
- conda-forge
- pytorch
dependencies:
- python=3.11
- numpy>=1.26
- scipy>=1.11
- matplotlib>=3.8
- jupyterlab
- ipykernel
- ase>=3.22
- pymatgen>=2024.1
- pytorch>=2.2
- cpuonly # remove on a GPU machine, add pytorch-cuda=12.1
- pandas
- h5py
- pyyaml
- tqdm
- pip
- pip:
- mace-torch>=0.3.6
- nequip
Create the environment:
If you do not have mamba, swap it for conda — the syntax is identical, only slower. The first solve may take a few minutes; conda-forge is a large channel.
Activating and deactivating¶
conda activate materials-simulation # turn it on
python -c "import numpy; print(numpy.__version__)"
conda deactivate # turn it off
Your shell prompt should show (materials-simulation) while the environment is active. If it does not, your shell init step earlier did not take effect; rerun conda init and open a fresh terminal.
requirements.txt vs environment.yml¶
You will see both formats in the wild. They are not interchangeable.
requirements.txtis the pure-pip format. One package per line, optional version pins (numpy==1.26.2). It records only Python packages.environment.ymlis conda's format. It records the Python version, non-Python dependencies (BLAS, MPI, CUDA), and channels. It can embed apip:section for packages not on conda-forge.
For materials simulation, environment.yml is the right default because we depend on compiled native libraries (BLAS for NumPy, libxc for DFT, CUDA for PyTorch). A bare requirements.txt cannot specify these.
A pragmatic rule:
- Use
environment.ymlto recreate a project environment from scratch. - Use
pip freeze > requirements.txtto snapshot exact pinned versions for a finished paper.
We come back to this in Section 1.4.
Verifying the install¶
With the environment active, run:
python - <<'EOF'
import numpy, scipy, matplotlib, ase, pymatgen, torch
print(f"numpy {numpy.__version__}")
print(f"scipy {scipy.__version__}")
print(f"matplotlib {matplotlib.__version__}")
print(f"ase {ase.__version__}")
print(f"pymatgen {pymatgen.__version__}")
print(f"torch {torch.__version__} CUDA={torch.cuda.is_available()}")
EOF
If all lines print without a traceback, your install is healthy. A quick functional test:
# verify_install.py
import numpy as np
from ase.build import bulk
si = bulk("Si", cubic=True)
print(si) # 8-atom Si conventional cell
print("Volume:", si.get_volume(), "ų")
print("First-NN distance:", si.get_all_distances(mic=True)[0, 1], "Å")
# A trivial NumPy check
A = np.random.default_rng(0).normal(size=(3, 3))
A = A + A.T # symmetric
eigs = np.linalg.eigvalsh(A)
print("Eigenvalues:", eigs)
Run with python verify_install.py. You should see a silicon cell, a volume near 160 ų, and three real eigenvalues.
Common pitfalls¶
"I have three Pythons. Which one is running?"¶
Check with:
If which -a lists multiple, the first on your PATH is what runs. When conda activate is active, this should be the one inside ~/miniforge3/envs/<your-env>/bin/.
pip install ignored my environment¶
If pip lives outside your environment, it installs to the wrong place. Always check:
The path should be inside the active environment. If it is not, install pip into the environment with conda install pip, then close and reopen your terminal.
"It worked yesterday and now it doesn't"¶
Three usual suspects:
- You ran
pip install <something>intobaseand it overrode a conda-managed library. Recreate the environment fromenvironment.yml. - A system update changed your shell init file. Re-run
conda init. - Your environment file was edited and re-solved; the new solve picked different versions. This is exactly why we pin versions before publishing — see Section 1.4.
Permission errors¶
If you see Permission denied or are tempted to sudo pip install, stop. You are about to corrupt your system Python. Activate your conda environment first; pip inside an environment never needs sudo.
Mixing pip and conda¶
conda and pip do not know about each other's metadata. A safe rule: install everything with conda/mamba first, then only use pip for packages unavailable on conda-forge (e.g., mace-torch). Never use pip to upgrade something conda installed.
Apple Silicon and Intel-only wheels¶
A few packages still ship only x86_64 wheels. If you hit no matching distribution, try conda-forge first (which usually has an osx-arm64 build); if not, run inside Rosetta with:
This forces an Intel environment on Apple Silicon; macOS will run it under Rosetta translation.
Beyond the basics: lockfiles¶
For a project that absolutely must reproduce, generate a lockfile:
This records every transitively-resolved package version. Commit it. To recreate the exact environment six months later:
We will revisit lockfiles, hashes, and full reproducibility in Section 1.4. For now, you have a working scientific Python install. Time to use it.