Python multiprocessing memory usage

Asked
Active3 hr before
Viewed126 times

8 Answers

usagepythonmemory
90%

The multiprocessing module is effectively based on the fork system call which creates a copy of the current process. Since you are loading the huge data before you fork (or create the multiprocessing.Process), the child process inherits a copy of the data.,You can avoid this situation by calling multiprocessing.Process before you load your huge data. Then the additional memory allocations will not be reflected in the child process when you load the data in the parent.,It seems to me that the sub-process gets its own copy of the huge dataset (when checking memory usage with top). Is this true? And if so then how can i avoid id (using double memory essentially)?, 3 Ty for the answer. Calling multiprocessing.Process before loading the data seems to have solved the issue. I will look into meminfo aswell. – FableBlaze Feb 7 '13 at 11:58

The multiprocessing module is effectively based on the fork system call which creates a copy of the current process. Since you are loading the huge data before you fork (or create the multiprocessing.Process), the child process inherits a copy of the data.

multiprocessing

The multiprocessing module is effectively based on the fork system call which creates a copy of the current process. Since you are loading the huge data before you fork (or create the multiprocessing.Process), the child process inherits a copy of the data.

fork

The multiprocessing module is effectively based on the fork system call which creates a copy of the current process. Since you are loading the huge data before you fork (or create the multiprocessing.Process), the child process inherits a copy of the data.

fork

The multiprocessing module is effectively based on the fork system call which creates a copy of the current process. Since you are loading the huge data before you fork (or create the multiprocessing.Process), the child process inherits a copy of the data.

multiprocessing.Process
load more v
88%

With Python 3’s performance enhancements, Python is faster than ever. Understanding Python memory management, and taking full advantage of multiprocessing, will allow you to speed up your Python CPU bound programs using multiple CPUs or multiple cores. At the same time, you can keep track of memory usage with memory profiling.,Now you have a basic understanding of how Python memory management works, you know that you have to sidestep the Global Interpreter Lock. In Python, instead of using threads, you can use multiprocessing.,However, experienced Python developers will tell you to watch that memory usage. You’ve got to program mindfully to get the most out of Python. In data science and machine learning, there’s a need for speeding up CPU-bound programs. It’s often done by leveraging the art of multiprocessing.,The real advantage of multiprocessing is that you don’t get a lot of the usual multithreading problems such as data corruption and deadlocks. Although you will have a larger memory footprint than with a multithreading model, it’s a good tradeoff.

72%

multiprocessing.shared_memory — Provides shared memory for direct access across processes,A subclass of BaseManager which can be used for the management of shared memory blocks across processes.,This class provides methods for creating and returning SharedMemory instances and for creating a list-like object (ShareableList) backed by shared memory.,The following example depicts how one, two, or many processes may access the same ShareableList by supplying the name of the shared memory block behind it:

>>> from multiprocessing import shared_memory
>>> shm_a = shared_memory.SharedMemory(create=True, size=10)
>>> type(shm_a.buf)
<class 'memoryview'>
   >>> buffer = shm_a.buf
   >>> len(buffer)
   10
   >>> buffer[:4] = bytearray([22, 33, 44, 55]) # Modify multiple at once
   >>> buffer[4] = 100 # Modify single byte at a time
   >>> # Attach to an existing shared memory block
   >>> shm_b = shared_memory.SharedMemory(shm_a.name)
   >>> import array
   >>> array.array('b', shm_b.buf[:5]) # Copy the data into a new array.array
   array('b', [22, 33, 44, 55, 100])
   >>> shm_b.buf[:5] = b'howdy' # Modify via shm_b using bytes
   >>> bytes(shm_a.buf[:5]) # Access via shm_a
   b'howdy'
   >>> shm_b.close() # Close each SharedMemory instance
   >>> shm_a.close()
   >>> shm_a.unlink() # Call unlink only once to release the shared memory
load more v
65%

The multiprocessing module is effectively based on the fork system call which creates a copy of the current process. Since you are loading the huge data before you fork (or create the multiprocessing.Process), the child process inherits a copy of the data.,You can avoid this situation by calling multiprocessing.Process before you load your huge data. Then the additional memory allocations will not be reflected in the child process when you load the data in the parent.,It seems to me that the sub-process gets its own copy of the huge dataset (when checking memory usage with top). Is this true? And if so then how can i avoid id (using double memory essentially)?,The real code (especially writeOutput()) is a lot more complicated. writeOutput() only uses these values that it takes as its arguments (meaning it does not reference data)

I have writen a program that can be summarized as follows:

def loadHugeData():
   #load it
return data

def processHugeData(data, res_queue):
   for item in data:
   #process it
res_queue.put(result)
res_queue.put("END")

def writeOutput(outFile, res_queue):
   with open(outFile, 'w') as f
res = res_queue.get()
while res != 'END':
   f.write(res)
res = res_queue.get()

res_queue = multiprocessing.Queue()

if __name__ == '__main__':
   data = loadHugeData()
p = multiprocessing.Process(target = writeOutput, args = (outFile, res_queue))
p.start()
processHugeData(data, res_queue)
p.join()
75%

I have writen a program that can be summarized as follows:,The Python dictionary implementation consumes a surprisingly small amount of memory,I am using Python 2.6 and program is running on linux.,As shown in the comments to my question, the answer came from Puciek.

I have writen a program that can be summarized as follows:

def loadHugeData():
   #load it
return data

def processHugeData(data, res_queue):
   for item in data:
   #process it
res_queue.put(result)
res_queue.put("END")

def writeOutput(outFile, res_queue):
   with open(outFile, 'w') as f
res = res_queue.get()
while res != 'END':
   f.write(res)
res = res_queue.get()

res_queue = multiprocessing.Queue()

if __name__ == '__main__':
   data = loadHugeData()
p = multiprocessing.Process(target = writeOutput, args = (outFile, res_queue))
p.start()
processHugeData(data, res_queue)
p.join()
load more v
40%

python - Will pypy memory usage grow forever? , php - Memory never continues to grow ,I didn't put any lock there as I believe main process is single threaded (callback is more or less like a event-driven thing per docs I read).,I found memory usage (both VIRT and RES) kept growing up till close()/join(), is there any solution to get rid of this? I tried maxtasksperchild with 2.7 but it didn't help either.

Here's the program:

#!/usr/bin/python

import multiprocessing

def dummy_func(r):
   pass

def worker():
   pass

if __name__ == '__main__':
   pool = multiprocessing.Pool(processes = 16)
for index in range(0, 100000):
   pool.apply_async(worker, callback = dummy_func)

# clean up
pool.close()
pool.join()
load more v
22%

This is a python module for monitoring memory consumption of a process as well as line-by-line analysis of memory consumption for python programs. It is a pure python module which depends on the psutil module.,This will execute the code f(1, n=int(1e6)) and return the memory consumption during this execution.,The second method tracks each child independently of the main process, serializing child rows by index to the output stream. Use the multiprocess flag and plot as follows:,This was tested using python-3.7.3, memory_profiler-0.57.0 and numpy-1.19.5.

Install via pip:

$ pip install - U memory_profiler

To install from source, download the package, extract and type:

$ python setup.py install

In the following example, we create a simple function my_func that allocates lists a, b and then deletes b:

@profile
def my_func():
   a = [1] * (10 ** 6)
b = [2] * (2 * 10 ** 7)
del b
return a

if __name__ == '__main__':
   my_func()

Execute the code passing the option -m memory_profiler to the python interpreter to load the memory_profiler module and print to stdout the line-by-line analysis. If the file name was example.py, this would result in:

$ python - m memory_profiler example.py

Output will follow:

Line # Mem usage Increment Occurences Line Contents
   ===
   === === === === === === === === === === === === === === === === === === ===
   3 38.816 MiB 38.816 MiB 1 @profile
4 def my_func():
   5 46.492 MiB 7.676 MiB 1 a = [1] * (10 ** 6)
6 199.117 MiB 152.625 MiB 1 b = [2] * (2 * 10 ** 7)
7 46.629 MiB - 152.488 MiB 1 del b
8 46.629 MiB 0.000 MiB 1
return a

A function decorator is also available. Use as follows:

from memory_profiler
import profile

@profile
def my_func():
   a = [1] * (10 ** 6)
b = [2] * (2 * 10 ** 7)
del b
return a

In function decorator, you can specify the precision as an argument to the decorator function. Use as follows:

from memory_profiler
import profile

@profile(precision = 4)
def my_func():
   a = [1] * (10 ** 6)
b = [2] * (2 * 10 ** 7)
del b
return a

Sometimes it is useful to have full memory usage reports as a function of time (not line-by-line) of external processes (be it Python scripts or not). In this case the executable mprof might be useful. Use it like:

mprof run <executable>
   mprof plot

To create a report that combines memory usage of all the children and the parent, use the include_children flag in either the profile decorator or as a command line argument to mprof:

mprof run --include-children <script>

The second method tracks each child independently of the main process, serializing child rows by index to the output stream. Use the multiprocess flag and plot as follows:

mprof run --multiprocess <script>
   mprof plot

It is possible to set breakpoints depending on the amount of memory used. That is, you can specify a threshold and as soon as the program uses more memory than what is specified in the threshold it will stop execution and run into the pdb debugger. To use it, you will have to decorate the function as done in the previous section with @profile and then run your script with the option -m memory_profiler --pdb-mmem=X, where X is a number representing the memory threshold in MB. For example:

$ python - m memory_profiler--pdb - mmem = 100 my_script.py
>>> from memory_profiler
import memory_usage
   >>>
   mem_usage = memory_usage(-1, interval = .2, timeout = 1) >>>
   print(mem_usage)[7.296875, 7.296875, 7.296875, 7.296875, 7.296875]

If you'd like to get the memory consumption of a Python function, then you should specify the function and its arguments in the tuple (f, args, kw). For example:

>>> # define a simple
function >>>
def f(a, n = 100):
   ...
   import time
   ...time.sleep(2)
   ...b = [a] * n
   ...time.sleep(1)
   ...
   return b
      ...
      >>>
      from memory_profiler
import memory_usage
   >>>
   memory_usage((f, (1, ), {
      'n': int(1e6)
   }))
>>> fp = open('memory_profiler.log', 'w+') >>>
   @profile(stream = fp) >>>
   def my_func():
   ...a = [1] * (10 ** 6)
   ...b = [2] * (2 * 10 ** 7)
   ...del b
   ...
   return a
>>> from memory_profiler
import LogFile
   >>>
   import sys >>>
   sys.stdout = LogFile('memory_profile_log')
>>> from memory_profiler
import LogFile
   >>>
   import sys >>>
   sys.stdout = LogFile('memory_profile_log', reportIncrementFlag = False)

To activate it whenever you start IPython, edit the configuration file for your IPython profile, ~/.ipython/profile_default/ipython_config.py, to register the extension like this (If you already have other extensions, just add this one to the list):

c.InteractiveShellApp.extensions = [
   'memory_profiler',
]

It then can be used directly from IPython to obtain a line-by-line report using the %mprun or %%mprun magic command. In this case, you can skip the @profile decorator and instead use the -f parameter, like this. Note however that function my_func must be defined in a file (cannot have been defined interactively in the Python interpreter):

In[1]: from example
import my_func, my_func_2

In[2]: % mprun - f my_func my_func()

or in cell mode:

In[3]: % % mprun - f my_func - f my_func_2
   ...: my_func()
   ...: my_func_2()

Another useful magic that we define is %memit, which is analogous to %timeit. It can be used as follows:

In[1]: % memit range(10000)
peak memory: 21.42 MiB, increment: 0.41 MiB

In[2]: % memit range(1000000)
peak memory: 52.10 MiB, increment: 31.08 MiB

or in cell mode (with setup code):

In[3]: % % memit l = range(1000000)
   ...: len(l)
   ...:
   peak memory: 52.14 MiB, increment: 0.08 MiB

For IPython 0.10, you can install it by editing the IPython configuration file ~/.ipython/ipy_user_conf.py to add the following lines:

# These two lines are standard and probably already there.
import IPython.ipapi
ip = IPython.ipapi.get()

# These two are the important ones.
import memory_profiler
memory_profiler.load_ipython_extension(ip)
load more v
60%

I don't understand how the memory usage with multiprocessing.Pool() works. My code is complex, but I'll do my best to describe it.,I found a work around by tweaking the maxtaskperchild and chunksize to limit the total amount of memory it uses at a given time. I feel like it's a bandaid, but at least, I'm up and running.,I have a script which loads about 50MB worth of data. I want to evaluate an expensive function (about 48s each time) 16000 times. So I look to multiprocessing to help me with this. Here is the basic layout, but I'll snip some of the details that (I think) don't matter.,requested to paste the whole thing... so here it is...

I have a script which loads about 50MB worth of data. I want to evaluate an expensive function (about 48s each time) 16000 times. So I look to multiprocessing to help me with this. Here is the basic layout, but I'll snip some of the details that (I think) don't matter.

import myglobals # empty myglobals.py file

with hdf.File('file.hdf5', 'r') as f:
   dset = f[f.keys()[0]]
data = dset.values # this is my data

# make a mask to select the data we want
mask = < mask >

   myglobals.data = data[mask]

#
try to reduce footprint
del data

# do some work to group the similar objects together and organize the data
selection = < work >

   p = Pool(cpu_count(), maxtasksperchild = 100)

result = p.imap(mp_worker, selection, chunksize = 50)

p.close()
p.join()
load more v