.. Lesson Plan documentation master file, created by sphinx-quickstart on Sun Jan 28 19:33:27 2018. You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. ======================= Profiling & Performance ======================= .. toctree:: :maxdepth: 2 :caption: Contents: ************ Introduction ************ In this module we cover strategies, tools and pitfalls around understanding the performance of a python code base. We mention or demonstrate analysis tools built into specific operating systems or available via python itself. We then we look at techniques for improving performance, including coding techniques, alternate Python interpreters, and interaction with C. Keep in mind that the material presented herein is not the last word on this topic, rather, as is often the case with the material in the program, it is merely the first. There are related topics covered elsewhere in the curriculum, such as concurrency and asynchronous programming; both topics are about improving the performance or responsiveness of systems. Also keep in mind that related topics are covered more deeply in other areas of the curriculum, so that for instance it might be worthwhile referencing the material on map/filter, comprehensions, lambdas and other functional programming constructs from their respective lessons while or before continuing with this lesson. Learning Objectives =================== Upon successful completion of this lesson, you will be able to: * use profiling strategies to identify bottlenecks in Python code. * use timing strategies to assess code constructs. * use PyPy to run simple Python scripts to improve their performance. * refactor Python code to use Cython for performance improvement. Recommended Text ================ | Mastering Python High Performance | by Fernando Doglio | Published September 2015 | ISBN: 9781783989300 https://www.packtpub.com/application-development/mastering-python-high-performance New Words or Concepts ===================== * Profiling * cProfile * time (GNU Time) * timeit * Great Circle * memoization * PyPy * Cython * cdef Required Reading ================ * Profiling & Timing | https://docs.python.org/3.6/library/debug.html | https://docs.python.org/3.6/library/profile.html | https://docs.python.org/3.6/library/timeit.html * PyPy | http://pypy.org/ * Cython | http://cython.org Optional Reading ================ * Profiling | https://pypi.python.org/pypi/memory_profiler | https://www.jetbrains.com/help/pycharm/profiler.html ******* Content ******* At some point, perhaps even during this curriculum, you may have heard someone say that comprehensions are fast. In a certain general sense perhaps they are, but in this context --- performance --- speed should strictly be considered a relative metric. In other words the question is not whether comprehensions are fast in general, but whether a comprehension is fast relative to some other construct in the specific context of the programming problem you are trying to address. To carry this thought further, consider some of the alternatives to comprehensions. Do you need to instantiate a list (or dictionary) immediately? If not, perhaps your use case would be better served by map and filter. Map returns a lazy iterator in Python 3.x, which is to say that when it is called it does not instantiate an entire list as does a comprehension. Instead a map object yields up subsequent values on subsequent calls thereby spreading computational cost over a larger timespan. Perhaps that is enough to make your graphic user interface or ReST API more responsive. Speaking of yield, would a generator be better than a comprehension for your use case? If you strictly need a sequence generated by an algorithm, it may well be, both in terms of memory consumption and in terms of raw speed. However, knowing your options in terms of programming constructs is not the last word on understanding the performance of your program. And gaining a view into the performance of your program in isolation does not necessarily translate into an understanding of its performance in a production system incurring high, bursting loads, experiencing network latencies, blocking on calls to a message queue, or retrieving data from a database running on a separate machine. Keep all of this in mind, because one of the themes of this lesson is that you should be skeptical of what you think you know until you have observed a system in a representative context. Profiling & Timing ================== In its broadest sense profiling is observing and measuring a system, an individual program or a code snippet in a context. The goal is to understand performance and resource consumption characteristics. The context should be representative, and when possible it should avoid the introduction of extraneous artifacts. For instance, when profiling code snippets of alternate algorithms, the same python interpreter and version should be used. When profiling your code on alternate interpreters, a single machine should be used. If a program depends on user input or on data from extraneous systems, mock or simulate these vectors in a problematically reproducible fashion so that multiple runs of the program are not differentially influenced. In short, avoid confounding factors and control for your variables. At a high level, before resorting to tools specific to Python, operating systems provide performance monitoring tools such as Activity Monitor (OS X), top (Linux & OS X), and Performance Monitor (Windows). These tools can be a good first place to look for memory, processor or network hogs. time ---- GNU Time, or something very close to it, is available on all major operating systems. It measures the duration of processes and can be used to clock virtually any program with a command line interface. https://www.gnu.org/software/time/ .. code-block:: bash $ time -p find ~/Documents real 3.02 user 0.15 sys 0.62 Gnu-time installed via Homebrew on Mac OS X for more depth. .. code-block:: bash $ brew install gnu-time $ gtime --verbose find ~/Documents Command being timed: "find /Users/demo-user/Documents" User time (seconds): 0.06 System time (seconds): 0.27 Percent of CPU this job got: 62% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.54 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 1292 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 449 Voluntary context switches: 1384 Involuntary context switches: 1028 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 [Video: Time] timeit ------ In addition to the command line time tool referenced above which is useful to time the run of your entire script, Python offers timeit which allows you to time expressions and calls within python modules. https://docs.python.org/3.6/library/timeit.html Timeit is used extensively in the videos associated with this lesson. [Video: Profiling] cProfile -------- The same manner in which Python offers debuggers (pdb, ipdb) it also supplies a profiler. With little or no modification to your module's source cProfile provides statistics on the number of times a function or method is called and the cumulative time spent within. The profiler when invoked from the command line with default arguments provides information not only about your script, but also about how it exercises the Python interpreter. .. code-block:: bash $ python -m cProfile great_circle.py 12000539 function calls (12000538 primitive calls) in 3.179 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 2 0.000 0.000 0.000 0.000 :103(release) 2 0.000 0.000 0.000 0.000 :143(__init__) 2 0.000 0.000 0.000 0.000 :147(__enter__) 2 0.000 0.000 0.000 0.000 :151(__exit__) 2 0.000 0.000 0.000 0.000 :157(_get_module_lock) 2 0.000 0.000 0.000 0.000 :176(cb) 3 0.000 0.000 0.000 0.000 :211(_call_with_frames_removed) 46 0.000 0.000 0.000 0.000 :222(_verbose_message) 2 0.000 0.000 0.000 0.000 :307(__init__) 2 0.000 0.000 0.000 0.000 :311(__enter__) 2 0.000 0.000 0.000 0.000 :318(__exit__) 8 0.000 0.000 0.000 0.000 :321() 1 0.000 0.000 0.000 0.000 :35(_new_module) 2 0.000 0.000 0.000 0.000 :369(__init__) 3 0.000 0.000 0.000 0.000 :403(cached) 2 0.000 0.000 0.000 0.000 :416(parent) 2 0.000 0.000 0.000 0.000 :424(has_location) 2 0.000 0.000 0.000 0.000 :504(_init_module_attrs) 2 0.000 0.000 0.000 0.000 :564(module_from_spec) 2 0.000 0.000 0.000 0.000 :58(__init__) 2 0.000 0.000 0.001 0.000 :651(_load_unlocked) 2 0.000 0.000 0.000 0.000 :707(find_spec) 2 0.000 0.000 0.000 0.000 :78(acquire) 2 0.000 0.000 0.000 0.000 :780(find_spec) 6 0.000 0.000 0.000 0.000 :843(__enter__) 6 0.000 0.000 0.000 0.000 :847(__exit__) 2 0.000 0.000 0.000 0.000 :870(_find_spec) 2 0.000 0.000 0.001 0.000 :936(_find_and_load_unlocked) 2 0.000 0.000 0.001 0.000 :966(_find_and_load) 11 0.000 0.000 0.000 0.000 :1080(_path_importer_cache) 2 0.000 0.000 0.000 0.000 :1117(_get_spec) 2 0.000 0.000 0.000 0.000 :1149(find_spec) 2 0.000 0.000 0.000 0.000 :1228(_get_spec) 9 0.000 0.000 0.000 0.000 :1233(find_spec) 2 0.000 0.000 0.000 0.000 :263(cache_from_source) 2 0.000 0.000 0.000 0.000 :361(_get_cached) 9 0.000 0.000 0.000 0.000 :37(_relax_case) 1 0.000 0.000 0.000 0.000 :393(_check_name_wrapper) 1 0.000 0.000 0.000 0.000 :430(_validate_bytecode_header) 1 0.000 0.000 0.000 0.000 :485(_compile_bytecode) 2 0.000 0.000 0.000 0.000 :52(_r_long) 2 0.000 0.000 0.000 0.000 :524(spec_from_file_location) 42 0.000 0.000 0.000 0.000 :57(_path_join) 42 0.000 0.000 0.000 0.000 :59() 2 0.000 0.000 0.000 0.000 :63(_path_split) 1 0.000 0.000 0.000 0.000 :669(create_module) 1 0.000 0.000 0.000 0.000 :672(exec_module) 1 0.000 0.000 0.000 0.000 :743(get_code) 12 0.000 0.000 0.000 0.000 :75(_path_stat) 1 0.000 0.000 0.000 0.000 :800(__init__) 1 0.000 0.000 0.000 0.000 :825(get_filename) 1 0.000 0.000 0.000 0.000 :830(get_data) 1 0.000 0.000 0.000 0.000 :840(path_stats) 2 0.000 0.000 0.000 0.000 :85(_path_is_mode_type) 1 0.000 0.000 0.000 0.000 :908(__init__) 1 0.000 0.000 0.000 0.000 :919(create_module) 1 0.000 0.000 0.000 0.000 :927(exec_module) 2 0.000 0.000 0.000 0.000 :94(_path_isfile) 1 0.000 0.000 0.000 0.000 cProfile.py:27(Profile) 1 0.000 0.000 0.000 0.000 cProfile.py:5() 1000000 0.150 0.000 0.150 0.000 great_circle.py:12(calculate_x) 2000000 0.216 0.000 0.216 0.000 great_circle.py:16(calculate_coordinate) 1000000 0.113 0.000 0.113 0.000 great_circle.py:20(calculate_theta) 1000000 0.952 0.000 1.500 0.000 great_circle.py:24(calculate_acos) 1000000 0.958 0.000 2.937 0.000 great_circle.py:28(great_circle_factored) 1 0.240 0.240 3.178 3.178 great_circle.py:38(main) 1 0.000 0.000 3.179 3.179 great_circle.py:8() 1 0.000 0.000 0.000 0.000 {built-in method _imp._fix_co_filename} 10 0.000 0.000 0.000 0.000 {built-in method _imp.acquire_lock} 1 0.000 0.000 0.000 0.000 {built-in method _imp.create_dynamic} 1 0.000 0.000 0.000 0.000 {built-in method _imp.exec_dynamic} 2 0.000 0.000 0.000 0.000 {built-in method _imp.is_builtin} 2 0.000 0.000 0.000 0.000 {built-in method _imp.is_frozen} 10 0.000 0.000 0.000 0.000 {built-in method _imp.release_lock} 4 0.000 0.000 0.000 0.000 {built-in method _thread.allocate_lock} 4 0.000 0.000 0.000 0.000 {built-in method _thread.get_ident} 1 0.000 0.000 0.000 0.000 {built-in method builtins.__build_class__} 2 0.000 0.000 0.000 0.000 {built-in method builtins.any} 2/1 0.000 0.000 3.179 3.179 {built-in method builtins.exec} 12 0.000 0.000 0.000 0.000 {built-in method builtins.getattr} 13 0.000 0.000 0.000 0.000 {built-in method builtins.hasattr} 12 0.000 0.000 0.000 0.000 {built-in method builtins.isinstance} 4 0.000 0.000 0.000 0.000 {built-in method builtins.len} 2 0.000 0.000 0.000 0.000 {built-in method from_bytes} 1 0.000 0.000 0.000 0.000 {built-in method marshal.loads} 1000000 0.106 0.000 0.106 0.000 {built-in method math.acos} 3000000 0.261 0.000 0.261 0.000 {built-in method math.cos} 2000000 0.181 0.000 0.181 0.000 {built-in method math.sin} 4 0.000 0.000 0.000 0.000 {built-in method posix.fspath} 4 0.000 0.000 0.000 0.000 {built-in method posix.getcwd} 12 0.000 0.000 0.000 0.000 {built-in method posix.stat} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 3 0.000 0.000 0.000 0.000 {method 'endswith' of 'str' objects} 4 0.000 0.000 0.000 0.000 {method 'get' of 'dict' objects} 44 0.000 0.000 0.000 0.000 {method 'join' of 'str' objects} 1 0.000 0.000 0.000 0.000 {method 'read' of '_io.FileIO' objects} 17 0.000 0.000 0.000 0.000 {method 'rpartition' of 'str' objects} 86 0.000 0.000 0.000 0.000 {method 'rstrip' of 'str' objects} When invoked from within an interpreter you can have the profiler be more selective with its reporting. .. code-block:: ipython $ ipython Python 3.6.4 (default, Jan 6 2018, 11:51:59) Type 'copyright', 'credits' or 'license' for more information IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help. In [1]: %run great_circle.py In [2]: %prun main() 12000004 function calls in 3.165 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 1000000 0.937 0.000 1.479 0.000 great_circle.py:24(calculate_acos) 1000000 0.926 0.000 2.898 0.000 great_circle.py:28(great_circle_factored) 1 0.267 0.267 3.165 3.165 great_circle.py:38(main) 3000000 0.258 0.000 0.258 0.000 {built-in method math.cos} 2000000 0.219 0.000 0.219 0.000 great_circle.py:16(calculate_coordinate) 2000000 0.185 0.000 0.185 0.000 {built-in method math.sin} 1000000 0.157 0.000 0.157 0.000 great_circle.py:12(calculate_x) 1000000 0.116 0.000 0.116 0.000 great_circle.py:20(calculate_theta) 1000000 0.099 0.000 0.099 0.000 {built-in method math.acos} 1 0.000 0.000 3.165 3.165 {built-in method builtins.exec} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 1 0.000 0.000 3.165 3.165 :1() In [3]: %prun calculate_x() 4 function calls in 0.000 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 0.000 0.000 {built-in method builtins.exec} 1 0.000 0.000 0.000 0.000 great_circle.py:12(calculate_x) 1 0.000 0.000 0.000 0.000 :1() 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} Note that PyCharm supports profilers, including cProfile. https://www.jetbrains.com/help/pycharm/profiler.html .. Memoization .. =========== .. Resource: https://anandology.com/python-practice-book/functional-programming.html#example-memoize .. Resource: Screenshots of Joe's code from 2018-03-07 PyPy ==== PyPy is an alternative Python interpreter focused on performance. It uses an optimizing just-in-time compiler which is particularly well suited to long-running programs. With very large data sets Python sometimes chokes whereas PyPy keeps going. PyPy is best evaluated directly vis-a-via the standard CPython reference interpreter. Profile code under standard Python and then under PyPy. PyPy can improve code that is factored out to functions, the more granular the better in some cases. Note that in the video, during the last timing test with a large number of iterations, PyPy runs faster with the factored code than the unfactored code. Remember to rely on timing and profiling to determine the best structure for your use case. https://github.com/rriehle/ProfilingPerformance/tree/master/source/scripts [Video: PyPy] Cython ====== [Video: Great Circle Part 1] Cython is Python's bridge to C. In allows you to continue to work largely in Python, albeit Python with a few extensions, and yet produce fast, statically compiled libraries to link into your Python code. Cython provides a keyword, cdef which provides access to C types such as char, int, long, float, double, struct and enum. It also enables type delectations for function or method return values. Take for instance the following function declaration. .. code-block:: python cdef double great_circle(double lon1, double lat1, double lon2, double lat2): cdef double a, b, c, radius, theta, x The great_circle function above is defined specifying double as the type of its return value. Each argument is declared to be of type double, and a series of symbols which will be used later in the function are all defined to be of type double. The cython compiler uses this information to produce a statically compiled library which is called from Python. Since this is no longer standard Python we no longer save it in files with .py extensions, instead by convention we use .pyx. Cython involves a different workflow, because it is a statically compiled environment. Statically compiled programming environments, including C and Cython, have separate compile, link and run steps. You can't simply type your code into an interpreter and get an immediate result. Statically compiled programming environments also have tooling to support the static build workflow. These tools include Python's distutils used by cython setup files and optionally Gnu Make used to manage the build dependencies of projects large and small. This is ``great_circle_setup_v1.py`` which is one of the cython setup files for use with the Great Circle code. .. code-block:: python from distutils.core import setup from Cython.Build import cythonize setup( name='Great Circle v1', ext_modules=cythonize("great_circle_v1.pyx"), ) The setup file tells Python how to have Cython build a statically compiled library to call from within Python. .. code-block:: bash $ python great_circle_setup_v1.py build_ext --inplace There are two main hurdles in learning to use Cython. The first involves the extensions to the Python language which are for the most part borrowed directly from C. Without familiarity with C it may not be clear what these new language elements mean, or how or why they are used. The second hurdle involves the build process associated with statically compiled languages --- i.e., the separate compile, link and run steps which will feel foreign to someone coming strictly from an interpreted language like Python. In both cases programmers familiar with C will have an advantage over those without. As with any new worthwhile programming strategy it will take an investment in time and effort to learn this new tool. The video links start to pull this information together. Arm yourself with the sample code when you watch Part 2. https://github.com/rriehle/ProfilingPerformance/tree/master/source/solutions/cython [Video: Great Circle Part 2] **** Quiz **** 1. The best way to improve the performance of your system is to. | Profile and time it first! | Jump right in and start making any changes that feel right! 2. In order to make Python perform well you need to know C. | True | False 3. Guido van Rossum, Python's creator, seems upset and worried the existence of competing implementations of the language such as PyPy. | True | False 4. In order to use PyPy you need to know C. | True | False 5. In order to use Cython you need to know C. | Not really | A little bit | Yeah, okay, it really helps 6. Comprehensions, map and filter, and generators can often be used to accomplish the same tasks; therefore it does not matter which programming constructs is used. | True | False ********** Activities ********** Spend class time in groups or as a whole using any of the modules or techniques referenced above. Of particular interest: * Explore code-only performance improvement strategies such as memoization. * Explore/demo cProfile side by side with PyCharm's built-in profilers. * Explore/demo Python's memory_profiler. https://pypi.python.org/pypi/memory_profiler * Use PyPy as the runtime interpreter for code developed by students during the curriculum. * Play with Cython and the Great Circle code! * Abandon Python's math library and use ``cdef extern`` to import and access C math functions into the Python code. (This vastly improves the performance of the Great Circle code!) * Dig into the compile and link steps required by Cython. * Work with setup files or any of the other methods to generate cython libraries. http://cython.readthedocs.io/en/latest/src/quickstart/build.html * Work with Cython profiling. http://docs.cython.org/en/latest/src/tutorial/profiling_tutorial.html * Create a Makefile to manage the build (compile/link steps) for all the Cython code you're generating. https://www.gnu.org/software/make/ ********** Assignment ********** There is no homework assignment for this lesson, since it is the last of the quarter. Instead spend class time on suggestions in the Activity section. ****************** Indices and tables ****************** * :ref:`genindex` * :ref:`modindex` * :ref:`search`