Python Pathlib complex glob patterns
TL;DR
import itertools
import pathlib
mypath = pathlib.Path()
patterns = ["*.jpg", "*.png"]
matched = list(
itertools.chain.from_iterable(
mypath.glob(pattern) for pattern in patterns
)
)
Today I learn Python pathlib.Path.glob
does not support complex glob patterns such as *.{jpg,png}
!
Let's see how it can be done. First try:
import pathlib
mypath = pathlib.Path() / "tests"
matched = list(mypath.glob("*.jpg")) + list(mypath.glob("*.png"))
It does the job but I'm not a big fan of concataining lists as it allocates multiple list objects. pathlib.Path.glob
returns a generator which is an iterable. Let's see if itertools
can rescue us:
import itertools
matched = list(itertools.chain(mypath.glob("*.jpg"), mypath.glob("*.png")))
Not bad! Only a single list
object is created out of multiple generators. Can we go further and apply some "DRY" principle?
matched = list(
itertools.chain.from_iterable(
mypath.glob(pattern) for pattern in ["*.jpg", "*.png"]
)
)
I do not think I can do better right now. The expression given as argument to itertools.chain.from_iterable
is a generator, itself generating a generator for each pattern using pathlib.Path.glob
. I do like these types of constructs in Python as the whole construction chain is lazily evaluated thanks to generators, and the composability aspect to it feels very much like functional programming.