Static Sites Authenticating Proxy
This project provides the core of the new static sites infrastructure.
gitlab-pages alone doesn't perform any authentication thus the need for httpd in front of it. httpd, if configured, authenticates users before proxying all requests to gitlab-pages pods. gitlab-pages uses Host header to determine which assets the client is requesting (value of the Host should be the same as one of the repository custom domains defined in the Pages section).
httpd's extra configuration is stored in a special Secret in the same namespace and every change in the Secret value results in an automatic reload of the configuration. Each configuration file (key in the Secret) is automatically generated by gitlab-pages-site-operator.
General architecture:
- The backend server is a gitlab pages server that is run and maintained by the Gitlab Team.
- The backend ingress controller has basic authentication enabled so we make sure that ONLY the authentication proxy can connect to it.
- The backend CNAME resolves to multiple IP addresses. On each authenticating proxy pod we pick up one randomly to provide high availability.
- The gitlab-pages-authenticating-proxy has a special vhost configured that is used as a healthcheck to make sure the backend is alive.
- GitLab Pages server behind the backend ingress controller has no authentication mechanism configured, we implement SSO in this gitlab-pages-authenticating-proxy, but all requests to the backend will be anonymous from the gitlab pages server's point of view
- routing of requests is as follows:
- The user connects to the gitlab pages-authenticating-proxy through the standard OKD ingress controller, who also takes care of the TLS termination.
- the proxy sends the backend hostname in SNI header, causing the backend's ingress controller to perform basic auth and route the request to the gitlab pages server.
- The proxy sets the original requested hostname in the Host header, allowing the gitlab pages server to know which site to serve
CRDs, Secrets and gitlab pages operator interaction
A single httpd deployment provides:
- a web server which reads vhost configuration from a specific
Secretin its namespace. The gitlab-pages-sites operator will populate thisSecretwithVirtualHostdefinitions for each user-createdGitlabPagesSiteresource. - Each VHost will be for a hostname of a GitLab pages site (typically a "custom domain" configured in GitLab).
- Each VHost authenticates requests with CERN SSO if site isn't configured for anonymous access
and proxies the request to the actual
gitlab-pagespods, which are not directly reachable. In order to make surehttpdis using the latest configuration there is a special sidecar container that sends a special signal tohttpdmaster process instructing it to reload its configuration.
Backend resolution of gitlab-backend-proxy(-dev).cern.ch considerations:
The backend ingress controller is resolving against 3 ips. ProxyPass will only resolve against one IP and will use it during the life of the apache process. This will present challenges in the following scenarios:
- IP changes due to DNS changes
- Endpoints unavailability due to machines going down
- Load balance the traffic against the 3 endpoints.
Implemented solution:
The apache container will resolve the IP addresses at start and randomly pick up one. This will be added to the /etc/hosts to force PoxyPass to use that IP. A virtual host will be configured against the backend that will be used to healthcheck the endpoint using a custom domain. If the endpoint is down due to any of the reasons exposed above, the pod will be killed and the deployment will recreate a new one. This will cause a renew of the IP address chosen to use as a backend.
Secret prefix and count
IMPORTANT!! The httpd.virtualHost.secret.count and httpd.virtualHost.secret.prefix cannot be changed and deployed wihtout proper preparation. If the count number increase, this will end up with dupplicate vhosts. Also bear in mind that although the secrets are created by this helm chart, they are then populated by gitlab-pages-site-operator. That means that both prefix and count have to be aligned with the gitlab-pages-site-operator configuration.
The current number of GPS supported is around 1500 vhosts per secret (currently 10) , which will give us enough room to think on a different approach.
Future work
The use of a shared secret between this operator and the proxy server is not ideal. It doesn't scale well and requires an extra syncronization between the operator and the proxy. A better approach should be to use something similar as WebEOS does, using a custom resource to generate vhosts config files on a side car and eventually mounted in the proxy server.