IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    Python Application Notes: pathname manipulations

    SparkandShine发表于 2017-01-31 20:25:55
    love 0

    I usually process simulation results in batch, which is generally associated with pathname manipulations. In this article, I take notes of pathname manipulations from my programming experiences.

    1. Overview

    1.1 Terms related to pathname

    The descriptions of the terms related to pathname are given below, excerpting from Wikipedia.

    pathname

    A path, the general form of the name of a file or directory, specifies a unique location in a file system. A path points to a file system location by following the directory tree hierarchy expressed in a string of characters in which path components, separated by a delimiting character, represent each directory.

    dirname

    dirname is a standard UNIX computer program. dirname will retrieve the directory-path name from a pathname ignoring any trailing slashes.

    basename

    basename is a standard UNIX computer program. basename will retrieve the last name from a pathname ignoring any trailing slashes.

    Note that the result of os.path.basename(path) is different from the Unix basename program where basename for '/foo/bar/' returns 'bar', the os.path.basename function returns an empty string ''.

    filename

    Sometimes “filename” is used to mean the entire name, such as the Windows name c:\directory\myfile.txt. Sometimes, it will be used to refer to the components, so the filename in this case would be myfile.txt. Sometimes, it is a reference that excludes an extension, so the filename would be just myfile.

    filename extension

    A filename extension (such as txt) is an identifier specified as a suffix to the name of a computer file. The extension indicates a characteristic of the file contents or its intended use. A file extension is typically delimited from the filename with a full stop (period).

    1.2 Python modules

    Three of the most commonly used Python modules for pathname manipulations are listed below.

    os.path

    Common pathname manipulations. This module implements some useful functions on pathnames. To read or write files see open(), and for accessing the filesystem see the os module. The path parameters can be passed as either strings, or bytes. 

    glob

    The glob module finds all the pathnames matching a specified pattern according to the rules used by the Unix shell, although results are returned in arbitrary order.

    pathlib

    Object-oriented filesystem paths. This module offers classes representing filesystem paths with semantics appropriate for different operating systems.

    Path classes are divided between pure paths (which provide path-handling operations which don’t actually access a filesystem.) and concrete paths (which inherit from pure paths but also provide methods to do system calls on path objects.)

    2. os.path

    2.1 split and join

    A full file path (e.g., /Users/sparkandshine/Documents/main.py) is composed of two components, which are,

    • directory name (/Users/sparkandshine/Documents in this case). This is the first element of the pair returned by os.path.split(path).
    • base name (main.py in this case). This is the second element of the pair returned by os.path.split(path).

    os.path.split(path) splits the pathname path into a pair, (head, tail) where tail is the last pathname component and head is everything leading up to that.

    If path ends in a slash, tail will be empty. If there is no slash in path, head will be empty. If path is empty, both head and tail are empty. The tail part will never contain a slash. Trailing slashes are stripped from head unless it is the root (one or more slashes only).

    import os
    
    >>> os.path.split('/Users/sparkandshine/Documents/main.py')
    ('/Users/sparkandshine/Documents', 'main.py')
    >>> os.path.split('/Users/sparkandshine/Documents')
    ('/Users/sparkandshine', 'Documents')
    
    >>> os.path.split('/Users/sparkandshine/Documents/')    # path ends in a slash
    ('/Users/sparkandshine/Documents', '')
    >>> os.path.split('main.py')                            # no slash in path
    ('', 'main.py')
    >>> os.path.split('')                                   # path is empty
    ('', '')
    >>> os.path.split('/')                                  # root
    ('/', '')
    
    
    os.path.splitext(path)  # Split the pathname path into a pair (root, ext) such that `root + ext == path`
    >>> os.path.splitext('/Users/sparkandshine/Documents/main.py')
    ('/Users/sparkandshine/Documents/main', '.py')
    

    os.path.join(path, *paths) joins one or more path components intelligently.

    The return value is the concatenation of path and any members of *paths with exactly one directory separator (os.sep) following each non-empty part except the last, meaning that the result will only end in a separator if the last part is empty. If a component is an absolute path, all previous components are thrown away and joining continues from the absolute path component.

    >>> os.path.join('/Users/sparkandshine', 'Documents', 'main.py')
    '/Users/sparkandshine/Documents/main.py'
    
    # the last part is empty
    >>> os.path.join('/Users/sparkandshine', 'Documents', '')           
    '/Users/sparkandshine/Documents/'
    
    # a component is an absolute path
    >>> os.path.join('/Users/sparkandshine', '/home/sparkandshine', 'Documents', 'main.py')
    '/home/sparkandshine/Documents/main.py'
    

    os.path.join(head, rail) can be regarded as the inverse operation of os.path.split(path). In all cases, join(head, tail) returns a path to the same location as path (but the strings may differ).

    2.2 Basic use

    Some of the most commonly used functions are listed below.

    os.path.basename(path)  # Return the base name of pathname path, the second element returned by `split(path)`. 
    os.path.dirname(path)   # Return the directory name of pathname path, the first element returned by `split(path)`. 
    
    os.path.exists(path)    # Return True if path refers to an existing path or an open file descriptor.
    
    os.path.abspath(path)   # Return a normalized absolutized version of the pathname path. 
    os.path.isabs(path)     # Return True if path is an absolute pathname.
    os.path.normpath(path)  # Normalize a pathname by collapsing redundant separators and up-level references.
    
    os.path.getatime(path)  # Return the time of last access of path.
    os.path.getmtime(path)  # Return the time of last modification of path. 
    os.path.getctime(path)  # Return the system’s ctime which, on some systems (like Unix) is the time of the last metadata change, and, on others (like Windows), is the creation time for path. 
    
    os.path.getsize(path)   # Return the size, in bytes, of path. 
    
    
    os.path.isfile(path)    # Return True if path is an existing regular file.
    os.path.isdir(path)     # Return True if path is an existing directory. 
    os.path.islink(path)    # Return True if path refers to a directory entry that is a symbolic link.  
    os.path.ismount(path)   # Return True if pathname path is a mount point.
    
    os.path.samefile(path1, path2)  # Return True if both pathname arguments refer to the same file or directory. 
    os.path.sameopenfile(fp1, fp2)  # Return True if the file descriptors fp1 and fp2 refer to the same file.
    os.path.samestat(stat1, stat2)  # Return True if the stat tuples stat1 and stat2 refer to the same file.
    

    2.3 Create a directory if doesn’t exist

    Create a directory if it doesn’t exist.

    subdir = 'msg_events_arbitrary'
    if not os.path.exists(subdir):
        os.makedirs(subdir)
    

    os.mkdir creates a directory with a numeric mode. Further, [os.makedirs(https://docs.python.org/3/library/os.html#os.makedirs) is a recursive directory creation function. Like os.mkdir(), but makes all intermediate-level directories needed to contain the leaf directory.

    # Python2
    os.mkdir(path[, mode])
    os.makedirs(path[, mode])
    
    # Python3
    os.mkdir(path, mode=0o777, *, dir_fd=None)  
    os.makedirs(name, mode=0o777, exist_ok=False)   # If exist_ok is False (the default), an OSError is raised if the target directory already exists.
    

    3. glob

    The glob module finds all the pathnames matching a specified pattern according to the rules used by the Unix shell, although results are returned in arbitrary order. *, ?, and character ranges expressed with [] will be correctly matched. glob treats filenames beginning with a dot . as special cases.

    If recursive is true, the pattern “**” will match any files and zero or more directories and subdirectories. If the pattern is followed by an os.sep (such as /), only directories and subdirectories match. (New in Python 3.5+)

    glob.glob(pathname, *, recursive=False) # Return a possibly-empty list of path names that match pathname, which must be a string containing a path specification. 
    
    glob.iglob(pathname, recursive=False)   # Return an iterator which yields the same values as glob() without actually storing them all simultaneously.
    
    glob.escape(pathname)                   # Escape all special characters ('?', '*' and '['). This is useful to match a string containing special characters.
    

    4. pathlib

    pathlib offers classes representing object-oriented filesystem paths. It is new since Python 3.4.

    4.1 Pure paths

    Pure path objects provide path-handling operations which don’t actually access a filesystem.

    import pathlib
    
    class pathlib.PurePath(*pathsegments)
    class pathlib.PurePosixPath(*pathsegments)
    class pathlib.PureWindowsPath(*pathsegments)
    
    # Examples
    >>> p = pathlib.PurePath('subdir', 'subdir_main.py')
    >>> p
    PurePosixPath('subdir/subdir_main.py')
    
    >>> p.parts
    ('subdir', 'subdir_main.py')
    >>> p.parent
    PurePosixPath('subdir')
    >>> p.suffix
    '.py'
    >>> p.stem
    'subdir_main'
    

    4.2 Concrete paths

    Concrete paths are subclasses of the pure path classes. In addition to operations provided by the latter, they also provide methods to do system calls on path objects.

    import pathlib
    
    class pathlib.Path(*pathsegments)
    class pathlib.PosixPath(*pathsegments)
    class pathlib.WindowsPath(*pathsegments)
    
    # Examples
    from pathlib import Path    # Import the main class
    p = Path('.')               # Create an instance
    subdirectories = [x for x in p.iterdir() if x.is_dir()] # # Listing subdirectories
    
    q = Path('stackoverflow.py')
    >>> q.exists()
    True
    >>> q.cwd()
    PosixPath('/Users/sparkandshine/git/tmp')
    >>> q.home()
    PosixPath('/Users/sparkandshine')
    >>> q.stat()
    os.stat_result(st_mode=33252, st_ino=4554446, st_dev=16777220, st_nlink=1, st_uid=501, st_gid=20, st_size=160, st_atime=1485886672, st_mtime=1485886672, st_ctime=1485886672)
    
    with q.open() as f:         # open a file
        lines = f.readline()
    

    5. Find all files ending with an extension

    Use glob

    import glob
    
    files = [pathname for pathname in glob.glob('*.py')]
    ['main.py', 'stackoverflow.py']
    
    files = [pathname for pathname in glob.glob('/Users/sparkandshine/git/tmp/*.py')]
    # ['/Users/sparkandshine/git/tmp/main.py', '/Users/sparkandshine/git/tmp/stackoverflow.py']
    
    files = [pathname for pathname in glob.glob('*.py')]
    ['main.py', 'stackoverflow.py']
    
    files = [pathname for pathname in glob.glob('**/*.py', recursive=True)] # Python 3.5+
    # ['main.py', 'stackoverflow.py', 'subdir/subdir_main.py']
    

    Use os.walk

    os.walk generates the file names in a directory tree by walking the tree. It yields a 3-tuple (dirpath, dirnames, filenames).

    import os
    
    os.walk(top, topdown=True, onerror=None, followlinks=False)
    
    for root, dirs, files in os.walk("."):
        for file in files:
            if file.endswith(".py"):
                 print(os.path.join(root, file))
    
    # Output
    ./main.py
    ./stackoverflow.py
    ./subdir/subdir_main.py   
    

    Use pathlib

    >>> from pathlib import Path
    >>> p = Path('.')
    >>> list(p.glob('**/*.py'))
    [PosixPath('main.py'), PosixPath('stackoverflow.py'), PosixPath('subdir/subdir_main.py')]
    

    Use os.listdir

    os.listdir(path='.') returns a list containing the names of the entries in the directory given by path.

    import os
    
    # os.listdir(path='.') returns a list containing the names of the entries in the directory given by path.
    
    files = [file for file in os.listdir('.') if file.endswith(".py")]
    # ['main.py', 'stackoverflow.py']
    

    References:
    [1] StackOverflow: Find all files in directory with extension .txt in Python
    [2] Python documentation



沪ICP备19023445号-2号
友情链接