This was my first EuroPython conference and I had high expectations because I heard a lot of good things about it. I must say that overall it didn’t let me down. I learned several new things and met a lot of new people. So lets dive straight into the most important lessons.
On Tuesday I attended “Effective Python for High-Performance Parallel Computing” training session by Michael McKerns. This was by far my favorite training session and I have learned a lot from it. Before Michael started with code examples and code analysis he emphasized two things:
- Do not assume what you hear/read/think. Time it and measure it.
- Stupid code is fast! Intelligent code is slow!
At this point I knew that the session is going to be amazing. He gave us a github link (https://github.com/mmckerns/tuthpc) where all examples with profiler results were located. He stressed out that we shouldn’t believe him and that we should test them ourselves (lesson #1).
I strongly suggest to clone his github repo (https://github.com/mmckerns/tuthpc) and test those examples yourself. Here are my quick notes (TL; DR):
- always compile regular expressions
- use local variables (true = True, local = GLOBAL)
- if you know how many elements it will be in your list, create it with None elements and then fill it (L = [None] * N)
- when inserting item on 0 index in a list use append then reverse (O(n) vs O(1))
- use built-in functions, use built-in functions, use built-in functions!!! (they are written in C layer)
- when extending list use .extend() and not +
- searching in set (hash map) is a lot faster then searching in list (O(1) vs O(n))
- constructing set is much slower then list so you usually don’t want to transform list into set and then search in it because it will be slower. But again you should test it
- += doesn’t create new instance of an object so use this in loops
- list comprehension is better than generator. for loop is better then generator and sometimes also than list comprehension (you should test it!)
- importing is expensive (e.g. numpy is 0.1 sec)
- switching between python arrays and numpy arrays is very expensive
- if you start writing intelligente and complex code you should stop and rethink if there is more stupid way of achieving your goal (see lesson #2)
- optimize the code you want to run in parallel. This is more important than to just run it in parallel.
Here is a full blog post that I have written for NiteoWeb.