Browse Source

Refactor storage operations into separate Backend classes (#348)

Following the discussion in #253 and #325 I've created a first iteration on what a `Backend` interface could look like and how the current file storage operations may be refactored into this interface. It goes from the following principles

* `app.py` talks only to `core.py` with regards to package operations
* at configuration time, a `Backend` implementation is chosen and created for the lifetime of the configured app
* `core.py` proxies requests for packages to this `Backend()`
* The `Backend` interface/api is defined through three things
  * methods that an implementation must implement
  * methods that an implementation may override if it knows better than the defaults
  * the `PkgFIle` class that is (should be) the main carrier of data
* where possible, implementation details must be hidden from concrete `Backend`s to promote extensibility

Other things I've done in this PR:
* I've tried to talk about packages and projects, rather than files and prefixes, since these are the domain terms PEP503 uses, and imho it's also more clear what it means
* Better testability of the `CacheManager` (no more race conditions when `watchdog` is installed during testing)
* Cleanup some more Python 2 code
* Started moving away from  `os.path` and `py.path` in favour of `pathlib`

Furthermore I've created a `plugin.py` with a sample of how I think plugin system could look like. This sampIe assumes we use `argparse`  and allows for the extension of cli arguments that a plugin may need. I think the actual implementation of such a plugin system is beyond the scope of this PR, but I've used it as a target for the Backend refactoring. If requested, I'll remove it from this PR.

The following things still need to be done / discussed. These can be part of this PR or moved into their own, separate PRs
- [ ] Simplify the `PgkFile` class. It currently consists of a number of attributes that don't necessarily belong with it, and not all attributes are aptly named (imho). I would like to minimalize the scope of `PkgFile` so that its only concern is being a data carrier between the app and the backends, and make its use more clear.
- [ ] Add a `PkgFile.metadata` that backend implementations may use to store custom data for packages. For example the current `PkgFile.root` attribute is an implementation detail of the filestorage backends, and other Backend implementations should not be bothered by it.
- [ ] Use `pathlib` wherever possible. This may also result in less attributes for `PkgFile`, since some things may be just contained in a single `Path` object, instead of multtiple strings.
- [ ] Improve testing of the `CacheManager`.

----
* move some functions around in preparation for backend module

* rename pkg_utils to pkg_helpers to prevent confusion with stdlib pkgutil

* further implement the current filestorage as simple file backend

* rename prefix to project, since that's more descriptive

* add digester func as attribute to pkgfile

* WIP caching backend

* WIP make cache better testable

* better testability of cache

* WIP file backends as plugin

* fix typos, run black

* Apply suggestions from code review

Co-authored-by: Matthew Planchard <mplanchard@users.noreply.github.com>

* add more type hints to pass mypy, fix tox.ini

* add package count method to backend

* add package count method to backend

* minor changes

* bugfix when checking invalid whl file

* check for existing package recursively, bugfix, some more pathlib

* fix unittest

* rm dead code

* exclude bottle.py from coverage

* fix merge mistakes

* fix tab indentation

* backend as a cli argument

* fix cli, add tests

* fix mypy

* fix more silly mistakes

* process feedback

* remove dead code

Co-authored-by: Matthew Planchard <mplanchard@users.noreply.github.com>
pull/370/head
PelleK 2 years ago committed by GitHub
parent
commit
cf424c982d
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
  1. 2
      .coveragerc
  2. 1
      .gitignore
  3. 126
      pypiserver/_app.py
  4. 305
      pypiserver/backend.py
  5. 50
      pypiserver/cache.py
  6. 91
      pypiserver/config.py
  7. 271
      pypiserver/core.py
  8. 26
      pypiserver/manage.py
  9. 112
      pypiserver/pkg_helpers.py
  10. 41
      pypiserver/plugin.py
  11. 91
      tests/test_app.py
  12. 42
      tests/test_backend.py
  13. 30
      tests/test_config.py
  14. 138
      tests/test_core.py
  15. 1
      tests/test_main.py
  16. 35
      tests/test_manage.py
  17. 116
      tests/test_pkg_helpers.py

2
.coveragerc

@ -0,0 +1,2 @@
[run]
omit = pypiserver/bottle.py

1
.gitignore vendored

@ -29,6 +29,7 @@ __pycache__/
**/*.egg-info/
/.standalone
/.coverage*
!/.coveragerc
/htmlcov/
/.installed.cfg
/develop-eggs/

126
pypiserver/_app.py

@ -1,10 +1,13 @@
from collections import namedtuple
import logging
import mimetypes
import os
import re
import zipfile
import xml.dom.minidom
import xmlrpc.client as xmlrpclib
import zipfile
from collections import namedtuple
from io import BytesIO
from urllib.parse import urljoin, urlparse
from pypiserver.config import RunConfig
from . import __version__
@ -18,26 +21,10 @@ from .bottle import (
Bottle,
template,
)
try:
import xmlrpc.client as xmlrpclib # py3
except ImportError:
import xmlrpclib # py2
try:
from io import BytesIO
except ImportError:
from StringIO import StringIO as BytesIO
try: # PY3
from urllib.parse import urljoin, urlparse
except ImportError: # PY2
from urlparse import urljoin, urlparse
from .pkg_helpers import guess_pkgname_and_version, normalize_pkgname_for_url
log = logging.getLogger(__name__)
config: RunConfig
app = Bottle()
@ -103,19 +90,13 @@ def favicon():
def root():
fp = request.custom_fullpath
try:
numpkgs = len(list(config.iter_packages()))
except Exception as exc:
log.error(f"Could not list packages: {exc}")
numpkgs = 0
# Ensure template() does not consider `msg` as filename!
msg = config.welcome_msg + "\n"
return template(
msg,
URL=request.url.rstrip("/") + "/",
VERSION=__version__,
NUMPKGS=numpkgs,
NUMPKGS=config.backend.package_count(),
PACKAGES=fp.rstrip("/") + "/packages/",
SIMPLE=fp.rstrip("/") + "/simple/",
)
@ -148,16 +129,12 @@ def remove_pkg():
if not name or not version:
msg = f"Missing 'name'/'version' fields: name={name}, version={version}"
raise HTTPError(400, msg)
pkgs = list(
filter(
lambda pkg: pkg.pkgname == name and pkg.version == version,
core.find_packages(config.iter_packages()),
)
)
if len(pkgs) == 0:
pkgs = list(config.backend.find_version(name, version))
if not pkgs:
raise HTTPError(404, f"{name} ({version}) not found")
for pkg in pkgs:
os.unlink(pkg.fn)
config.backend.remove_package(pkg)
Upload = namedtuple("Upload", "pkg sig")
@ -183,13 +160,11 @@ def file_upload():
continue
if (
not is_valid_pkg_filename(uf.raw_filename)
or core.guess_pkgname_and_version(uf.raw_filename) is None
or guess_pkgname_and_version(uf.raw_filename) is None
):
raise HTTPError(400, f"Bad filename: {uf.raw_filename}")
if not config.overwrite and core.exists(
config.package_root, uf.raw_filename
):
if not config.overwrite and config.backend.exists(uf.raw_filename):
log.warning(
f"Cannot upload {uf.raw_filename!r} since it already exists! \n"
" You may start server with `--overwrite` option. "
@ -200,7 +175,7 @@ def file_upload():
" You may start server with `--overwrite` option.",
)
core.store(config.package_root, uf.raw_filename, uf.save)
config.backend.add_package(uf.raw_filename, uf.file)
if request.auth:
user = request.auth[0]
else:
@ -231,10 +206,10 @@ def update():
@app.route("/simple")
@app.route("/simple/:prefix")
@app.route("/simple/:project")
@app.route("/packages")
@auth("list")
def pep_503_redirects(prefix=None):
def pep_503_redirects(project=None):
return redirect(request.custom_fullpath + "/", 301)
@ -257,7 +232,7 @@ def handle_rpc():
)
response = []
ordering = 0
for p in config.iter_packages():
for p in config.backend.get_all_packages():
if p.pkgname.count(value) > 0:
# We do not presently have any description/summary, returning
# version instead
@ -278,7 +253,7 @@ def handle_rpc():
@app.route("/simple/")
@auth("list")
def simpleindex():
links = sorted(core.get_prefixes(config.iter_packages()))
links = sorted(config.backend.get_projects())
tmpl = """\
<html>
<head>
@ -295,59 +270,62 @@ def simpleindex():
return template(tmpl, links=links)
@app.route("/simple/:prefix/")
@app.route("/simple/:project/")
@auth("list")
def simple(prefix=""):
# PEP 503: require normalized prefix
normalized = core.normalize_pkgname_for_url(prefix)
if prefix != normalized:
return redirect("/simple/{0}/".format(normalized), 301)
files = sorted(
core.find_packages(config.iter_packages(), prefix=prefix),
def simple(project):
# PEP 503: require normalized project
normalized = normalize_pkgname_for_url(project)
if project != normalized:
return redirect(f"/simple/{normalized}/", 301)
packages = sorted(
config.backend.find_project_packages(project),
key=lambda x: (x.parsed_version, x.relfn),
)
if not files:
if not packages:
if not config.disable_fallback:
return redirect(f"{config.fallback_url.rstrip('/')}/{prefix}/")
return redirect(f"{config.fallback_url.rstrip('/')}/{project}/")
return HTTPError(404, f"Not Found ({normalized} does not exist)\n\n")
fp = request.custom_fullpath
links = [
current_uri = request.custom_fullpath
links = (
(
os.path.basename(f.relfn),
urljoin(fp, f"../../packages/{f.fname_and_hash(config.hash_algo)}"),
os.path.basename(pkg.relfn),
urljoin(current_uri, f"../../packages/{pkg.fname_and_hash}"),
)
for f in files
]
for pkg in packages
)
tmpl = """\
<html>
<head>
<title>Links for {{prefix}}</title>
<title>Links for {{project}}</title>
</head>
<body>
<h1>Links for {{prefix}}</h1>
<h1>Links for {{project}}</h1>
% for file, href in links:
<a href="{{href}}">{{file}}</a><br>
% end
</body>
</html>
"""
return template(tmpl, prefix=prefix, links=links)
return template(tmpl, project=project, links=links)
@app.route("/packages/")
@auth("list")
def list_packages():
fp = request.custom_fullpath
files = sorted(
core.find_packages(config.iter_packages()),
packages = sorted(
config.backend.get_all_packages(),
key=lambda x: (os.path.dirname(x.relfn), x.pkgname, x.parsed_version),
)
links = [
(f.relfn_unix, urljoin(fp, f.fname_and_hash(config.hash_algo)))
for f in files
]
links = (
(pkg.relfn_unix, urljoin(fp, pkg.fname_and_hash)) for pkg in packages
)
tmpl = """\
<html>
<head>
@ -367,7 +345,7 @@ def list_packages():
@app.route("/packages/:filename#.*#")
@auth("download")
def server_static(filename):
entries = core.find_packages(config.iter_packages())
entries = config.backend.get_all_packages()
for x in entries:
f = x.relfn_unix
if f == filename:
@ -385,8 +363,8 @@ def server_static(filename):
return HTTPError(404, f"Not Found ({filename} does not exist)\n\n")
@app.route("/:prefix")
@app.route("/:prefix/")
def bad_url(prefix):
@app.route("/:project")
@app.route("/:project/")
def bad_url(project):
"""Redirect unknown root URLs to /simple/."""
return redirect(core.get_bad_url_redirect_path(request, prefix))
return redirect(core.get_bad_url_redirect_path(request, project))

305
pypiserver/backend.py

@ -0,0 +1,305 @@
import abc
import functools
import hashlib
import itertools
import os
import typing as t
from pathlib import Path
from .cache import CacheManager, ENABLE_CACHING
from .core import PkgFile
from .pkg_helpers import (
normalize_pkgname,
is_listed_path,
guess_pkgname_and_version,
)
if t.TYPE_CHECKING:
from .config import _ConfigCommon as Configuration
PathLike = t.Union[str, os.PathLike]
class IBackend(abc.ABC):
@abc.abstractmethod
def get_all_packages(self) -> t.Iterable[PkgFile]:
pass
@abc.abstractmethod
def find_project_packages(self, project: str) -> t.Iterable[PkgFile]:
pass
@abc.abstractmethod
def find_version(self, name: str, version: str) -> t.Iterable[PkgFile]:
pass
@abc.abstractmethod
def get_projects(self) -> t.Iterable[str]:
pass
@abc.abstractmethod
def exists(self, filename: str) -> bool:
pass
@abc.abstractmethod
def digest(self, pkg: PkgFile) -> t.Optional[str]:
pass
@abc.abstractmethod
def package_count(self) -> int:
pass
@abc.abstractmethod
def add_package(self, filename: str, stream: t.BinaryIO) -> None:
pass
@abc.abstractmethod
def remove_package(self, pkg: PkgFile) -> None:
pass
class Backend(IBackend, abc.ABC):
def __init__(self, config: "Configuration"):
self.hash_algo = config.hash_algo
@abc.abstractmethod
def get_all_packages(self) -> t.Iterable[PkgFile]:
"""Implement this method to return an Iterable of all packages (as
PkgFile objects) that are available in the Backend.
"""
pass
@abc.abstractmethod
def add_package(self, filename: str, stream: t.BinaryIO) -> None:
"""Add a package to the Backend. `filename` is the package's filename
(without any directory parts). It is just a name, there is no file by
that name (yet). `stream` is an open file-like object that can be used
to read the file's content. To convert the package into an actual file
on disk, run `write_file(filename, stream)`.
"""
pass
@abc.abstractmethod
def remove_package(self, pkg: PkgFile) -> None:
"""Remove a package from the Backend"""
pass
@abc.abstractmethod
def exists(self, filename: str) -> bool:
"""Does a package by the given name exist?"""
pass
def digest(self, pkg: PkgFile) -> t.Optional[str]:
if self.hash_algo is None or pkg.fn is None:
return None
return digest_file(pkg.fn, self.hash_algo)
def package_count(self) -> int:
"""Return a count of all available packages. When implementing a Backend
class, either use this method as is, or override it with a more
performant version.
"""
return sum(1 for _ in self.get_all_packages())
def get_projects(self) -> t.Iterable[str]:
"""Return an iterable of all (unique) projects available in the store
in their PEP503 normalized form. When implementing a Backend class,
either use this method as is, or override it with a more performant
version.
"""
return set(package.pkgname_norm for package in self.get_all_packages())
def find_project_packages(self, project: str) -> t.Iterable[PkgFile]:
"""Find all packages from a given project. The project may be given
as either the normalized or canonical name. When implementing a
Backend class, either use this method as is, or override it with a
more performant version.
"""
return (
x
for x in self.get_all_packages()
if normalize_pkgname(project) == x.pkgname_norm
)
def find_version(self, name: str, version: str) -> t.Iterable[PkgFile]:
"""Return all packages that match PkgFile.pkgname == name and
PkgFile.version == version` When implementing a Backend class,
either use this method as is, or override it with a more performant
version.
"""
return filter(
lambda pkg: pkg.pkgname == name and pkg.version == version,
self.get_all_packages(),
)
class SimpleFileBackend(Backend):
def __init__(self, config: "Configuration"):
super().__init__(config)
self.roots = [Path(root).resolve() for root in config.roots]
def get_all_packages(self) -> t.Iterable[PkgFile]:
return itertools.chain.from_iterable(listdir(r) for r in self.roots)
def add_package(self, filename: str, stream: t.BinaryIO) -> None:
write_file(stream, self.roots[0].joinpath(filename))
def remove_package(self, pkg: PkgFile) -> None:
if pkg.fn is not None:
os.remove(pkg.fn)
def exists(self, filename: str) -> bool:
return any(
filename == existing_file.name
for root in self.roots
for existing_file in all_listed_files(root)
)
class CachingFileBackend(SimpleFileBackend):
def __init__(
self,
config: "Configuration",
cache_manager: t.Optional[CacheManager] = None,
):
super().__init__(config)
self.cache_manager = cache_manager or CacheManager() # type: ignore
def get_all_packages(self) -> t.Iterable[PkgFile]:
return itertools.chain.from_iterable(
self.cache_manager.listdir(r, listdir) for r in self.roots
)
def digest(self, pkg: PkgFile) -> t.Optional[str]:
if self.hash_algo is None or pkg.fn is None:
return None
return self.cache_manager.digest_file(
pkg.fn, self.hash_algo, digest_file
)
def write_file(fh: t.BinaryIO, destination: PathLike) -> None:
"""write a byte stream into a destination file. Writes are chunked to reduce
the memory footprint
"""
chunk_size = 2 ** 20 # 1 MB
offset = fh.tell()
try:
with open(destination, "wb") as dest:
for chunk in iter(lambda: fh.read(chunk_size), b""):
dest.write(chunk)
finally:
fh.seek(offset)
def listdir(root: Path) -> t.Iterator[PkgFile]:
root = root.resolve()
files = all_listed_files(root)
yield from valid_packages(root, files)
def all_listed_files(root: Path) -> t.Iterator[Path]:
for dirpath, dirnames, filenames in os.walk(root):
dirnames[:] = (
dirname for dirname in dirnames if is_listed_path(Path(dirname))
)
for filename in filenames:
if not is_listed_path(Path(filename)):
continue
filepath = root / dirpath / filename
if Path(filepath).is_file():
yield filepath
def valid_packages(root: Path, files: t.Iterable[Path]) -> t.Iterator[PkgFile]:
for file in files:
res = guess_pkgname_and_version(str(file.name))
if res is not None:
pkgname, version = res
fn = str(file)
root_name = str(root)
yield PkgFile(
pkgname=pkgname,
version=version,
fn=fn,
root=root_name,
relfn=fn[len(root_name) + 1 :],
)
def digest_file(file_path: PathLike, hash_algo: str) -> str:
"""
Reads and digests a file according to specified hashing-algorith.
:param file_path: path to a file on disk
:param hash_algo: any algo contained in :mod:`hashlib`
:return: <hash_algo>=<hex_digest>
From http://stackoverflow.com/a/21565932/548792
"""
blocksize = 2 ** 16
digester = hashlib.new(hash_algo)
with open(file_path, "rb") as f:
for block in iter(lambda: f.read(blocksize), b""):
digester.update(block)
return f"{hash_algo}={digester.hexdigest()}"
def get_file_backend(config: "Configuration") -> Backend:
if ENABLE_CACHING:
return CachingFileBackend(config)
return SimpleFileBackend(config)
PkgFunc = t.TypeVar("PkgFunc", bound=t.Callable[..., t.Iterable[PkgFile]])
def with_digester(func: PkgFunc) -> PkgFunc:
@functools.wraps(func)
def add_digester_method(
self: "BackendProxy", *args: t.Any, **kwargs: t.Any
) -> t.Iterable[PkgFile]:
packages = func(self, *args, **kwargs)
for package in packages:
package.digester = self.backend.digest
yield package
return t.cast(PkgFunc, add_digester_method)
class BackendProxy(IBackend):
def __init__(self, wraps: Backend):
self.backend = wraps
@with_digester
def get_all_packages(self) -> t.Iterable[PkgFile]:
return self.backend.get_all_packages()
@with_digester
def find_project_packages(self, project: str) -> t.Iterable[PkgFile]:
return self.backend.find_project_packages(project)
def find_version(self, name: str, version: str) -> t.Iterable[PkgFile]:
return self.backend.find_version(name, version)
def get_projects(self) -> t.Iterable[str]:
return self.backend.get_projects()
def exists(self, filename: str) -> bool:
assert "/" not in filename
return self.backend.exists(filename)
def package_count(self) -> int:
return self.backend.package_count()
def add_package(self, filename: str, fh: t.BinaryIO) -> None:
assert "/" not in filename
return self.backend.add_package(filename, fh)
def remove_package(self, pkg: PkgFile) -> None:
return self.backend.remove_package(pkg)
def digest(self, pkg: PkgFile) -> t.Optional[str]:
return self.backend.digest(pkg)

50
pypiserver/cache.py

@ -4,10 +4,24 @@
#
from os.path import dirname
from watchdog.observers import Observer
from pathlib import Path
import typing as t
import threading
try:
from watchdog.observers import Observer
ENABLE_CACHING = True
except ImportError:
Observer = None
ENABLE_CACHING = False
if t.TYPE_CHECKING:
from pypiserver.core import PkgFile
class CacheManager:
"""
@ -26,6 +40,11 @@ class CacheManager:
"""
def __init__(self):
if not ENABLE_CACHING:
raise RuntimeError(
"Please install the extra cache requirements by running 'pip "
"install pypiserver[cache]' to use the CachingFileBackend"
)
# Cache for listdir output
self.listdir_cache = {}
@ -46,7 +65,12 @@ class CacheManager:
self.digest_lock = threading.Lock()
self.listdir_lock = threading.Lock()
def listdir(self, root, impl_fn):
def listdir(
self,
root: t.Union[Path, str],
impl_fn: t.Callable[[Path], t.Iterable["PkgFile"]],
) -> t.Iterable["PkgFile"]:
root = str(root)
with self.listdir_lock:
try:
return self.listdir_cache[root]
@ -56,11 +80,13 @@ class CacheManager:
if root not in self.watched:
self._watch(root)
v = list(impl_fn(root))
v = list(impl_fn(Path(root)))
self.listdir_cache[root] = v
return v
def digest_file(self, fpath, hash_algo, impl_fn):
def digest_file(
self, fpath: str, hash_algo: str, impl_fn: t.Callable[[str, str], str]
) -> str:
with self.digest_lock:
try:
cache = self.digest_cache[hash_algo]
@ -82,13 +108,17 @@ class CacheManager:
cache[fpath] = v
return v
def _watch(self, root):
def _watch(self, root: str):
self.watched.add(root)
self.observer.schedule(_EventHandler(self, root), root, recursive=True)
def invalidate_root_cache(self, root: t.Union[Path, str]):
with self.listdir_lock:
self.listdir_cache.pop(str(root), None)
class _EventHandler:
def __init__(self, cache, root):
def __init__(self, cache: CacheManager, root: str):
self.cache = cache
self.root = root
@ -101,8 +131,7 @@ class _EventHandler:
return
# Lazy: just invalidate the whole cache
with cache.listdir_lock:
cache.listdir_cache.pop(self.root, None)
cache.invalidate_root_cache(self.root)
# Digests are more expensive: invalidate specific paths
paths = []
@ -117,6 +146,3 @@ class _EventHandler:
for _, subcache in cache.digest_cache.items():
for path in paths:
subcache.pop(path, None)
cache_manager = CacheManager()

91
pypiserver/config.py

@ -37,25 +37,31 @@ import argparse
import contextlib
import hashlib
import io
import itertools
import logging
import pathlib
import pkg_resources
import re
import sys
import textwrap
import typing as t
from distutils.util import strtobool as strtoint
# The `passlib` requirement is optional, so we need to verify its import here.
import pkg_resources
from pypiserver.backend import (
SimpleFileBackend,
CachingFileBackend,
Backend,
IBackend,
get_file_backend,
BackendProxy,
)
# The `passlib` requirement is optional, so we need to verify its import here.
try:
from passlib.apache import HtpasswdFile
except ImportError:
HtpasswdFile = None
from pypiserver import core
# The "strtobool" function in distutils does a nice job at parsing strings,
# but returns an integer. This just wraps it in a boolean call so that we
@ -80,6 +86,7 @@ class DEFAULTS:
PACKAGE_DIRECTORIES = [pathlib.Path("~/packages").expanduser().resolve()]
PORT = 8080
SERVER_METHOD = "auto"
BACKEND = "auto"
def auth_arg(arg: str) -> t.List[str]:
@ -236,6 +243,28 @@ def add_common_args(parser: argparse.ArgumentParser) -> None:
"standard python library)"
),
)
parser.add_argument(
"--hash-algo",
default=DEFAULTS.HASH_ALGO,
type=hash_algo_arg,
help=(
"Any `hashlib` available algorithm to use for generating fragments "
"on package links. Can be disabled with one of (0, no, off, false)."
),
)
parser.add_argument(
"--backend",
default=DEFAULTS.BACKEND,
choices=("auto", "simple-dir", "cached-dir"),
dest="backend_arg",
help=(
"A backend implementation. Keep the default 'auto' to automatically"
" determine whether to activate caching or not"
),
)
parser.add_argument(
"--version",
action="version",
@ -254,7 +283,6 @@ def get_parser() -> argparse.ArgumentParser:
"directories starting with a dot. Multiple package directories "
"may be specified."
),
# formatter_class=argparse.RawTextHelpFormatter,
formatter_class=PreserveWhitespaceRawTextHelpFormatter,
epilog=(
"Visit https://github.com/pypiserver/pypiserver "
@ -381,15 +409,6 @@ def get_parser() -> argparse.ArgumentParser:
action="store_true",
help="Allow overwriting existing package files during upload.",
)
run_parser.add_argument(
"--hash-algo",
default=DEFAULTS.HASH_ALGO,
type=hash_algo_arg,
help=(
"Any `hashlib` available algorithm to use for generating fragments "
"on package links. Can be disabled with one of (0, no, off, false)."
),
)
run_parser.add_argument(
"--welcome",
metavar="HTML_FILE",
@ -504,9 +523,12 @@ def get_parser() -> argparse.ArgumentParser:
TConf = t.TypeVar("TConf", bound="_ConfigCommon")
BackendFactory = t.Callable[["_ConfigCommon"], Backend]
class _ConfigCommon:
hash_algo: t.Optional[str] = None
def __init__(
self,
roots: t.List[pathlib.Path],
@ -514,6 +536,8 @@ class _ConfigCommon:
log_frmt: str,
log_file: t.Optional[str],
log_stream: t.Optional[t.IO],
hash_algo: t.Optional[str],
backend_arg: str,
) -> None:
"""Construct a RuntimeConfig."""
# Global arguments
@ -521,18 +545,24 @@ class _ConfigCommon:
self.log_file = log_file
self.log_stream = log_stream
self.log_frmt = log_frmt
self.roots = roots
self.hash_algo = hash_algo
self.backend_arg = backend_arg
# Derived properties are directly based on other properties and are not
# included in equality checks.
self._derived_properties: t.Tuple[str, ...] = (
"iter_packages",
"package_root",
"backend",
)
# The first package directory is considered the root. This is used
# for uploads.
self.package_root = self.roots[0]
self.backend = self.get_backend(backend_arg)
@classmethod
def from_namespace(
cls: t.Type[TConf], namespace: argparse.Namespace
@ -551,6 +581,8 @@ class _ConfigCommon:
log_stream=namespace.log_stream,
log_frmt=namespace.log_frmt,
roots=namespace.package_directory,
hash_algo=namespace.hash_algo,
backend_arg=namespace.backend_arg,
)
@property
@ -565,13 +597,17 @@ class _ConfigCommon:
# If we've specified 3 or more levels of verbosity, just return not set.
return levels.get(self.verbosity, logging.NOTSET)
def iter_packages(self) -> t.Iterator[core.PkgFile]:
"""Iterate over packages in root directories."""
yield from (
itertools.chain.from_iterable(
core.listdir(str(r)) for r in self.roots
)
)
def get_backend(self, arg: str) -> IBackend:
available_backends: t.Dict[str, BackendFactory] = {
"auto": get_file_backend,
"simple-dir": SimpleFileBackend,
"cached-dir": CachingFileBackend,
}
backend = available_backends[arg]
return BackendProxy(backend(self))
def with_updates(self: TConf, **kwargs: t.Any) -> TConf:
"""Create a new config with the specified updates.
@ -624,7 +660,6 @@ class RunConfig(_ConfigCommon):
fallback_url: str,
server_method: str,
overwrite: bool,
hash_algo: t.Optional[str],
welcome_msg: str,
cache_control: t.Optional[int],
log_req_frmt: str,
@ -643,13 +678,11 @@ class RunConfig(_ConfigCommon):
self.fallback_url = fallback_url
self.server_method = server_method
self.overwrite = overwrite
self.hash_algo = hash_algo
self.welcome_msg = welcome_msg
self.cache_control = cache_control
self.log_req_frmt = log_req_frmt
self.log_res_frmt = log_res_frmt
self.log_err_frmt = log_err_frmt
# Derived properties
self._derived_properties = self._derived_properties + ("auther",)
self.auther = self.get_auther(auther)
@ -669,7 +702,6 @@ class RunConfig(_ConfigCommon):
"fallback_url": namespace.fallback_url,
"server_method": namespace.server,
"overwrite": namespace.overwrite,
"hash_algo": namespace.hash_algo,
"welcome_msg": namespace.welcome,
"cache_control": namespace.cache_control,
"log_req_frmt": namespace.log_req_frmt,
@ -752,6 +784,9 @@ class UpdateConfig(_ConfigCommon):
}
Configuration = t.Union[RunConfig, UpdateConfig]
class Config:
"""Config constructor for building a config from args."""
@ -767,9 +802,7 @@ class Config:
return default_config.with_updates(**overrides)
@classmethod
def from_args(
cls, args: t.Sequence[str] = None
) -> t.Union[RunConfig, UpdateConfig]:
def from_args(cls, args: t.Sequence[str] = None) -> Configuration:
"""Construct a Config from the passed args or sys.argv."""
# If pulling args from sys.argv (commandline arguments), argv[0] will
# be the program name, (i.e. pypi-server), so we don't need to

271
pypiserver/core.py

@ -1,154 +1,69 @@
#! /usr/bin/env python
#! /usr/bin/env python3
"""minimal PyPI like server for use with pip/easy_install"""
import hashlib
import logging
import mimetypes
import os
import re
import typing as t
from urllib.parse import quote
log = logging.getLogger(__name__)
from pypiserver.pkg_helpers import normalize_pkgname, parse_version
mimetypes.add_type("application/octet-stream", ".egg")
mimetypes.add_type("application/octet-stream", ".whl")
mimetypes.add_type("text/plain", ".asc")
# ### Next 2 functions adapted from :mod:`distribute.pkg_resources`.
#
component_re = re.compile(r"(\d+ | [a-z]+ | \.| -)", re.I | re.VERBOSE)
replace = {"pre": "c", "preview": "c", "-": "final-", "rc": "c", "dev": "@"}.get
def _parse_version_parts(s):
for part in component_re.split(s):
part = replace(part, part)
if part in ["", "."]:
continue
if part[:1] in "0123456789":
yield part.zfill(8) # pad for numeric comparison
else:
yield "*" + part
yield "*final" # ensure that alpha/beta/candidate are before final
def parse_version(s):
parts = []
for part in _parse_version_parts(s.lower()):
if part.startswith("*"):
# remove trailing zeros from each series of numeric parts
while parts and parts[-1] == "00000000":
parts.pop()
parts.append(part)
return tuple(parts)
#
#### -- End of distribute's code.
_archive_suffix_rx = re.compile(
r"(\.zip|\.tar\.gz|\.tgz|\.tar\.bz2|-py[23]\.\d-.*|"
r"\.win-amd64-py[23]\.\d\..*|\.win32-py[23]\.\d\..*|\.egg)$",
re.I,
)
wheel_file_re = re.compile(
r"""^(?P<namever>(?P<name>.+?)-(?P<ver>\d.*?))
((-(?P<build>\d.*?))?-(?P<pyver>.+?)-(?P<abi>.+?)-(?P<plat>.+?)
\.whl|\.dist-info)$""",
re.VERBOSE,
)
_pkgname_re = re.compile(r"-\d+[a-z_.!+]", re.I)
_pkgname_parts_re = re.compile(
r"[\.\-](?=cp\d|py\d|macosx|linux|sunos|solaris|irix|aix|cygwin|win)", re.I
)
def _guess_pkgname_and_version_wheel(basename):
m = wheel_file_re.match(basename)
if not m:
return None, None
name = m.group("name")
ver = m.group("ver")
build = m.group("build")
if build:
return name, ver + "-" + build
else:
return name, ver
def guess_pkgname_and_version(path):
path = os.path.basename(path)
if path.endswith(".asc"):
path = path.rstrip(".asc")
if path.endswith(".whl"):
return _guess_pkgname_and_version_wheel(path)
if not _archive_suffix_rx.search(path):
return
path = _archive_suffix_rx.sub("", path)
if "-" not in path:
pkgname, version = path, ""
elif path.count("-") == 1:
pkgname, version = path.split("-", 1)
elif "." not in path:
pkgname, version = path.rsplit("-", 1)
else:
pkgname = _pkgname_re.split(path)[0]
ver_spec = path[len(pkgname) + 1 :]
parts = _pkgname_parts_re.split(ver_spec)
version = parts[0]
return pkgname, version
def normalize_pkgname(name):
"""Perform PEP 503 normalization"""
return re.sub(r"[-_.]+", "-", name).lower()
def normalize_pkgname_for_url(name):
"""Perform PEP 503 normalization and ensure the value is safe for URLs."""
return quote(re.sub(r"[-_.]+", "-", name).lower())
def is_allowed_path(path_part):
p = path_part.replace("\\", "/")
return not (p.startswith(".") or "/." in p)
def get_bad_url_redirect_path(request, project):
"""Get the path for a bad root url."""
uri = request.custom_fullpath
if uri.endswith("/"):
uri = uri[:-1]
uri = uri.rsplit("/", 1)[0]
project = quote(project)
uri += f"/simple/{project}/"
return uri
class PkgFile:
__slots__ = [
"fn",
"root",
"_fname_and_hash",
"relfn",
"relfn_unix",
"pkgname_norm",
"pkgname",
"version",
"parsed_version",
"replaces",
"pkgname", # The projects/package name with possible capitalization
"version", # The package version as a string
"fn", # The full file path
"root", # An optional root directory of the file
"relfn", # The file path relative to the root
"replaces", # The previous version of the package (used by manage.py)
"pkgname_norm", # The PEP503 normalized project name
"digest", # The file digest in the form of <algo>=<hash>
"relfn_unix", # The relative file path in unix notation
"parsed_version", # The package version as a tuple of parts
"digester", # a function that calculates the digest for the package
]
digest: t.Optional[str]
digester: t.Optional[t.Callable[["PkgFile"], t.Optional[str]]]
parsed_version: tuple
relfn_unix: t.Optional[str]
def __init__(
self, pkgname, version, fn=None, root=None, relfn=None, replaces=None
self,
pkgname: str,
version: str,
fn: t.Optional[str] = None,
root: t.Optional[str] = None,
relfn: t.Optional[str] = None,
replaces: t.Optional["PkgFile"] = None,
):
self.pkgname = pkgname
self.pkgname_norm = normalize_pkgname(pkgname)
self.version = version
self.parsed_version = parse_version(version)
self.parsed_version: tuple = parse_version(version)
self.fn = fn
self.root = root
self.relfn = relfn
self.relfn_unix = None if relfn is None else relfn.replace("\\", "/")
self.replaces = replaces
self.digest = None
self.digester = None
def __repr__(self):
def __repr__(self) -> str:
return "{}({})".format(
self.__class__.__name__,
", ".join(
@ -159,109 +74,9 @@ class PkgFile:
),
)
def fname_and_hash(self, hash_algo):
if not hasattr(self, "_fname_and_hash"):
if hash_algo:
self._fname_and_hash = (
f"{self.relfn_unix}#{hash_algo}="
f"{digest_file(self.fn, hash_algo)}"
)
else:
self._fname_and_hash = self.relfn_unix
return self._fname_and_hash
def _listdir(root: str) -> t.Iterable[PkgFile]:
root = os.path.abspath(root)
for dirpath, dirnames, filenames in os.walk(root):
dirnames[:] = [x for x in dirnames if is_allowed_path(x)]
for x in filenames:
fn = os.path.join(root, dirpath, x)
if not is_allowed_path(x) or not os.path.isfile(fn):
continue
res = guess_pkgname_and_version(x)
if not res:
# #Seems the current file isn't a proper package
continue
pkgname, version = res
if pkgname:
yield PkgFile(
pkgname=pkgname,
version=version,
fn=fn,
root=root,
relfn=fn[len(root) + 1 :],
)
def find_packages(pkgs, prefix=""):
prefix = normalize_pkgname(prefix)
for x in pkgs:
if prefix and x.pkgname_norm != prefix:
continue
yield x
def get_prefixes(pkgs):
normalized_pkgnames = set()
for x in pkgs:
if x.pkgname:
normalized_pkgnames.add(x.pkgname_norm)
return normalized_pkgnames
def exists(root, filename):
assert "/" not in filename
dest_fn = os.path.join(root, filename)
return os.path.exists(dest_fn)
def store(root, filename, save_method):
assert "/" not in filename
dest_fn = os.path.join(root, filename)
save_method(dest_fn, overwrite=True) # Overwite check earlier.
def get_bad_url_redirect_path(request, prefix):
"""Get the path for a bad root url."""
p = request.custom_fullpath
if p.endswith("/"):
p = p[:-1]
p = p.rsplit("/", 1)[0]
prefix = quote(prefix)
p += "/simple/{}/".format(prefix)
return p
def _digest_file(fpath, hash_algo):
"""
Reads and digests a file according to specified hashing-algorith.
:param str sha256: any algo contained in :mod:`hashlib`
:return: <hash_algo>=<hex_digest>
From http://stackoverflow.com/a/21565932/548792
"""
blocksize = 2 ** 16
digester = hashlib.new(hash_algo)
with open(fpath, "rb") as f:
for block in iter(lambda: f.read(blocksize), b""):
digester.update(block)
return digester.hexdigest()
try:
from .cache import cache_manager
def listdir(root: str) -> t.Iterable[PkgFile]:
# root must be absolute path
return cache_manager.listdir(root, _listdir)
def digest_file(fpath, hash_algo):
# fpath must be absolute path
return cache_manager.digest_file(fpath, hash_algo, _digest_file)
except ImportError:
listdir = _listdir
digest_file = _digest_file
@property
def fname_and_hash(self) -> str:
if self.digest is None and self.digester is not None:
self.digest = self.digester(self)
hashpart = f"#{self.digest}" if self.digest else ""
return self.relfn_unix + hashpart # type: ignore

26
pypiserver/manage.py

@ -6,13 +6,15 @@ import itertools
import os
import sys
from distutils.version import LooseVersion
from pathlib import Path
from subprocess import call
from xmlrpc.client import Server
import pip
from . import core
from xmlrpc.client import Server
from .backend import listdir
from .core import PkgFile
from .pkg_helpers import normalize_pkgname, parse_version
def make_pypi_client(url):
@ -41,7 +43,7 @@ def filter_latest_pkgs(pkgs):
pkgname2latest = {}
for x in pkgs:
pkgname = core.normalize_pkgname(x.pkgname)
pkgname = normalize_pkgname(x.pkgname)
if pkgname not in pkgname2latest:
pkgname2latest[pkgname] = x
@ -53,9 +55,9 @@ def filter_latest_pkgs(pkgs):
def build_releases(pkg, versions):
for x in versions:
parsed_version = core.parse_version(x)
parsed_version = parse_version(x)
if parsed_version > pkg.parsed_version:
yield core.PkgFile(pkgname=pkg.pkgname, version=x, replaces=pkg)
yield PkgFile(pkgname=pkg.pkgname, version=x, replaces=pkg)
def find_updates(pkgset, stable_only=True):
@ -98,7 +100,8 @@ def find_updates(pkgset, stable_only=True):
if no_releases:
sys.stdout.write(
f"no releases found on pypi for {', '.join(sorted(no_releases))}\n\n"
f"no releases found on pypi for"
f" {', '.join(sorted(no_releases))}\n\n"
)
return need_update
@ -135,8 +138,7 @@ class PipCmd:
def update_package(pkg, destdir, dry_run=False):
"""Print and optionally execute a package update."""
print(
"# update {0.pkgname} from {0.replaces.version} to "
"{0.version}".format(pkg)
f"# update {pkg.pkgname} from {pkg.replaces.version} to {pkg.version}"
)
cmd = tuple(
@ -148,7 +150,7 @@ def update_package(pkg, destdir, dry_run=False):
)
)
print("{}\n".format(" ".join(cmd)))
print(" ".join(cmd), end="\n\n")
if not dry_run:
call(cmd)
@ -171,7 +173,9 @@ def update(pkgset, destdir=None, dry_run=False, stable_only=True):