class: center, middle # Data manipulation, analysis and visualisation in Python Specialist course Doctoral schools of Ghent University
27, 30 January and 3 February 2025 Joris Van den Bossche, Stijn Van Hoey https://github.com/jorisvandenbossche/DS-python-data-analysis
_With the support of the Flemish Government._
--- class: center, middle # Who are you? Go to https://hackmd.io/J3eLIyKRQ_Kc3PRV4DLCwQ?both
--- ### Joris Van den Bossche
jorisvandenbossche
* Open source software developer and teacher * Pandas, GeoPandas, scikit-learn, Apache Arrow --- ### Stijn Van Hoey
stijnvanhoey
* Freelance developer and teacher * Software engineer and IT team lead at [Fluves](https://www.fluves.com/) .center[ ![:scale 90%](./static/img/work_stijn_1.png)] --- class: middle, section_background # Setting up a working environment --- ## Setting up a working environment For the setup instructions, see the [setup page](https://jorisvandenbossche.github.io/DS-python-data-analysis/setup.html). --- class: left, middle 0. If not already downloaded the course material, installed conda and setup the environment, see [setup instructions section 1 and 2](https://jorisvandenbossche.github.io/DS-python-data-analysis/setup.html) 1. Next, make sure to complete 3 and 4 of the [setup instructions](https://jorisvandenbossche.github.io/DS-python-data-analysis/setup.html). > If you succesfully done all steps, put up your `green sticky note` on your laptop screen... Next: - Surf to and fill in [the questionnaire](https://hackmd.io/J3eLIyKRQ_Kc3PRV4DLCwQ?both) - In Jupyter Lab, start with the 'notebooks/00-jupyter_introduction.ipynb'. > Installation or setup issues? Put up your `red sticky note` on your laptop screen. --- class: center, middle When you see something like this... ![:scale 100%](./static/img/startup.png) ...relax, you're ready to start! --- class: center, middle ![:scale 100%](https://i.ytimg.com/vi/PlaYMh-u-2w/maxresdefault.jpg) --- class: middle, left ### Time is divided between - group sessions: we explain new concepts (aka 'theory') - practise sessions: you work on exercises or case studies In case of questions, remarks, suggestions, you can always interrupt us and just ask. During practise sessions, use the `red sticky note` on top of your laptop screen to let us know you have a question. ### Status check We will regularly ask for a check (ready with exercise, installation succesfull...). Use the `green sticky note` on top of your laptop screen to say 👍. ### Feel lost? Just ask either one of us, we are here to help you. --- class: middle, center ![:scale 80%](./static/img/issuetracker.png) Report bugs, typo's, suggestions... as issues ([New issue](https://github.com/jorisvandenbossche/course-python-data/issues/new)) or see the [contributing guidelines](https://github.com/jorisvandenbossche/course-python-data/blob/main/CONTRIBUTING.md) --- class: middle, section_background # Working with Python --- # Conda ### Why using conda? - Consistent package manager across Windows, Mac and Linux - Many precompiled packages available - Less problems with installation! -- ### Why different environments? - Manage the dependencies of a specific project/paper/group/... - You can install different version of Python and other packages alongside on your computer - Easily share environments with other --- ## Small overview of conda commands Creating a new environment: ``` conda create -n my_env python=3.9 pandas # or from environment file conda env create -f environment.yml ``` Activating an environment: ``` conda activate my_env ``` Install a new package: ``` conda install matplotlib # if not working, try: pip install ... ``` List all installed packages: `conda list` List all your environments: `conda info -e` See the docs: https://docs.conda.io/projects/conda/en/latest/user-guide/index.html --- class: center, middle ### Keep track of your python ecosystem
with an `environment.yml`
``` conda env export > environment.yml ``` --- # Writing Python code ## IPython console
.center[![:scale 75%](./static/img/ipython.png)] --- ## Interactive Development Environment (IDE) * [**Spyder**](https://pythonhosted.org/spyder/). The familiar environment for Matlab/Rstudio-users. * [**VS Code**](https://code.visualstudio.com/). General purpose editor with powerful plugin ecosystem. * [**PyCharm**](https://www.jetbrains.com/pycharm/): Popular for web-development and Django applications, powerful when doing 'application' development (packages, libraries, software) * [Eclipse + **pyDev plugin**](http://www.pydev.org/): If you already work in Eclipse, add the python environment * ... --- ## Jupyter Lab/Notebook
(*previously called IPython notebook*)
**Jupyter notebook** provides an **interactive** scripting environment,
ideal for exploration, prototyping,... .center[![:scale 70%](./static/img/notebook.png)] -- ...the stuff we're dealing with in this course! --- class: middle, section_background # Tidy data --- class: center, middle background-image: url(./static/img/tidy_data_paper.png) .footnote[Wickham, H. (2014)
Tidy Data, Vol. 59, Issue 10,
Journal of Statistical Software. doi:10.18637/jss.v059.i10] --- class: center, middle | WWTP | Treatment A | Treatment B | |:------|-------------|-------------| | Destelbergen | 8. | 6.3 | | Landegem | 7.5 | 5.2 | | Dendermonde | 8.3 | 6.2 | | Eeklo | 6.5 | 7.2 | --- class: center, middle | WWTP | Treatment | pH | |:------|:-------------:|:-------------:| | Destelbergen | A | 8. | | Landegem | A | 7.5 | | Dendermonde | A | 8.3 | | Eeklo | A | 6.5 | | Destelbergen | B | 6.3 | | Landegem | B | 5.2 | | Dendermonde | B | 6.2 | | Eeklo | B | 7.2 | --- class: center, middle .center[![:scale 100%](./static/img/tidy_data_scheme.png)] --- class: center, middle # How are you feeling? ![:scale 100%](http://esq.h-cdn.co/assets/15/51/980x490/landscape-1450137389-john-cleese.JPG) ### https://forms.gle/wmFH89ELcRKFYuWFA Please fill in the questionnaire! --- class: center, middle # Closing notes --- class: center, middle # Python's scientific ecosystem #### ## Adjusted from figure by Jake VanderPlas --- class: center, middle, bgheader background-image: url(./static/img/JakeVdP-ecosystem1.svg) background-size: cover --- count: false class: center, middle, bgheader background-image: url(./static/img/JakeVdP-ecosystem2.svg) background-size: cover --- count: false class: center, middle, bgheader background-image: url(./static/img/JakeVdP-ecosystem3.svg) background-size: cover --- count: false class: center, middle, bgheader background-image: url(./static/img/JakeVdP-ecosystem4.svg) background-size: cover --- count: false class: center, middle, bgheader background-image: url(./static/img/JakeVdP-ecosystem5.svg) background-size: cover --- # A rich ecosystem of packages:
**Machine learning**: scikit-learn, tensorflow, pytorch, keras, chainer, ... **Performance**: Numba, Cython, Numexpr, Pythran, C/Fortran wrappers, ... **Visualisation**: Bokeh, Seaborn, Plotnine, Altair, Plotly, Mayavi, HoloViews, datashader, vaex ... **Data structures and parallel/distributed computation**: Xarray, Dask, Distributed, Cupy, ... Specialized packages in many **scientific fields**: astronomy, natural language processing, image processing, geospatial, ... **Packaging and distribution**: pip/wheels, conda, Anaconda, Canopy, ... --- class: center, middle ### Reading advice [Good Enough Practices in Scientific Computing](https://arxiv.org/pdf/1609.00037v1.pdf) > "*However, while most scientists are careful to validate their laboratory and field equipment, most do not know how reliable their software is*" --- class: center, middle # Thanks ### Joris Van den Bossche
jorisvandenbossche
### Stijn Van Hoey
stijnvanhoey