Loading...
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 | = The Elixir Cross Referencer :doctype: book :pp: {plus}{plus} :toc: :toc-placement!: Elixir is a source code cross-referencer inspired by https://en.wikipedia.org/wiki/LXR_Cross_Referencer[LXR]. It's written in Python and its main purpose is to index every release of a C or C{pp} project (like the Linux kernel) while keeping a minimal footprint. It uses Git as a source-code file store and Berkeley DB for cross-reference data. Internally, it indexes Git _blobs_ rather than trees of files to avoid duplicating work and data. It has a straightforward data structure (reminiscent of older LXR releases) to keep queries simple and fast. You can see it in action on https://elixir.bootlin.com/ link:CHANGELOG.adoc[Changelog] toc::[] = Requirements * Python >= 3.8 * Git >= 1.9 * The Jinja2 and Pygments (>= 2.7) Python libraries * Berkeley DB (and its Python binding) * Universal Ctags * Perl (for non-greedy regexes and automated testing) * Falcon and `mod_wsgi` (for the REST API) = Architecture The shell script (`script.sh`) is the lower layer and provides commands to interact with Git and other Unix utilities. The Python commands use the shell script's services to provide access to the annotated source code and identifier lists (`query.py`) or to create and update the databases (`update.py`). Finally, the web interface (`web.py`) and uses the query interface to generate HTML pages and to answer REST queries, respectively. When installing the system, you should test each layer manually and make sure it works correctly before moving on to the next one. = Manual Installation == Install Dependencies ____ For Debian ____ ---- sudo apt install python3-pip python3-venv libdb-dev python3-dev build-essential universal-ctags perl git apache2 libapache2-mod-wsgi-py3 libjansson4 ---- == Download Elixir Project ---- git clone https://github.com/bootlin/elixir.git /usr/local/elixir/ ---- == Create a virtualenv for Elixir --- python -m venv /usr/local/elixir/venv . /usr/local/elixir/venv/bin/activate pip install -r /usr/local/elixir/requirements.txt --- == Create directories for project data ---- mkdir -p /path/elixir-data/linux/repo mkdir -p /path/elixir-data/linux/data ---- == Set environment variables Two environment variables are used to tell Elixir where to find the project's local git repository and its databases: * `LXR_REPO_DIR` (the git repository directory for your project) * `LXR_DATA_DIR` (the database directory for your project) Now open `/etc/profile` and append the following content. ---- export LXR_REPO_DIR=/path/elixir-data/linux/repo export LXR_DATA_DIR=/path/elixir-data/linux/data ---- And then run `source /etc/profile`. == Clone Kernel source code First clone the master tree released by Linus Torvalds: ---- cd /path/elixir-data/linux git clone --bare https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git repo ---- Then, you should also declare a `stable` remote branch corresponding to the `stable` tree, to get all release updates: ---- cd repo git remote add stable git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git git fetch stable ---- Then, you can also declare an `history` remote branch corresponding to the old Linux versions not present in the other repos, to get all the old version still available: ---- cd repo git remote add history https://github.com/bootlin/linux-history.git git fetch history --tags ---- Feel free to add more remote branches in this way, as Elixir will consider tags from all remote branches. == First Test ---- cd /usr/local/elixir/ ./script.sh list-tags ---- == Create Database ---- . ./venv/bin/activate ./update.py <number of threads> ---- ____ Generating the full database can take a long time: it takes about 15 hours on a Xeon E3-1245 v5 to index 1800 tags in the Linux kernel. For that reason, you may want to tweak the script (for example, by limiting the number of tags with a "head") in order to test the update and query commands. You can even create a new Git repository and just create one tag instead of using the official kernel repository which is very large. ____ == Second Test Verify that the queries work: $ ./elixir/query.py v4.10 ident raw_spin_unlock_irq C $ ./elixir/query.py v4.10 file /kernel/sched/clock.c NOTE: `v4.10` can be replaced with any other tag. NOTE: Don't forget to activate the virtual environment! == Configure httpd The CGI interface (`web.py`) is meant to be called from your web server. Since it includes support for indexing multiple projects, it expects a different variable (`LXR_PROJ_DIR`) which points to a directory with a specific structure: * `<LXR_PROJ_DIR>` ** `<project 1>` *** `data` *** `repo` ** `<project 2>` *** `data` *** `repo` ** `<project 3>` *** `data` *** `repo` It will then generate the other two variables upon calling the query command. Now replace `/etc/apache2/sites-enabled/000-default.conf` with `docker/000-default.conf`. Note: If using httpd (RedHat/Centos) instead of apache2 (Ubuntu/Debian), the default config file to edit is `/etc/httpd/conf.d/elixir.conf`. Finally, start the httpd server. ---- systemctl restart apache2 ---- == Configure SELinux policy When running systemd with SELinux enabled, httpd server can only visit limited directories. If your /path/elixir-data/ is not one of these allowed directories, you will be responded with 500 status code. To allow httpd server to visit /path/elixir-data/, run following codes: ---- chcon -R -t httpd_sys_rw_content_t /path/elixir-data/ ---- To check if it takes effect, run the following codes: ---- ls -Z /path/elixir-data/ ---- In case you want to check SELinux log related with httpd, run the following codes: ---- audit2why -a | grep httpd | less ---- == Configure systemd log directory By default, the error log of elixir will be put in /tmp/elixir-errors. However, systemd enables PrivateTmp by default. And, the final error directory will be like /tmp/systemd-private-xxxxx-httpd.service-xxxx/tmp/elixir-errors. If you want to disable it, configure httpd.service with the following attribute: ---- PrivateTmp=false ---- == Configuration for other servers Other HTTP servers (like nginx or lighthttpd) may not support WSGI and may require a separate WSGI server, like uWSGI. Information about how to configure uWSGI with Lighthttpd can be found here: https://redmine.lighttpd.net/projects/lighttpd/wiki/HowToPythonWSGI#Python-WSGI-apps-via-uwsgi-SCGI-FastCGI-or-HTTP-using-the-uWSGI-server Pull requests with example uWSGI configuration for Elixir are welcome. = REST API usage After configuring httpd, you can test the API usage: == ident query Send a get request to `/api/ident/<Project>/<Ident>?version=<version>&family=<family>`. For example: curl http://127.0.0.1/api/ident/barebox/cdev?version=latest&family=C The response body is of the following structure: ---- { "definitions": [{"path": "commands/loadb.c", "line": 71, "type": "variable"}, ...], "references": [{"path": "arch/arm/boards/cm-fx6/board.c", "line": "64,64,71,72,75", "type": null}, ...] } ---- = Maintenance and enhancements == Using a cache to improve performance At Bootlin, we're using the https://varnish-cache.org/[Varnish http cache] as a front-end to reduce the load on the server running the Elixir code. .-------------. .---------------. .-----------------------. | Http client | --------> | Varnish cache | --------> | Apache running Elixir | '-------------' '---------------' '-----------------------' == Keeping Elixir databases up to date To keep your Elixir databases up to date and index new versions that are released, we're proposing to use a script like `utils/update-elixir-data` which is called through a daily cron job. You can set `$ELIXIR_THREADS` if you want to change the number of threads used by update.py for indexing (by default the number of CPUs on your system). == Keeping git repository disk usage under control As you keep updating your git repositories, you may notice that some can become considerably bigger than they originally were. This seems to happen when a `gc.log` file appears in a big repository, apparently causing git's garbage collector (`git gc`) to fail, and therefore causing the repository to consume disk space at a fast pace every time new objects are fetched. When this happens, you can save disk space by packing git directories as follows: ---- cd <bare-repo> git prune rm gc.log git gc --aggressive ---- Actually, a second pass with the above commands will save even more space. To process multiple git repositories in a loop, you may use the `utils/pack-repositories` that we are providing, run from the directory where all repositories are found. = Building Docker images Dockerfiles are provided in the `docker/` directory. To build the image, run the following commands: # git clone https://github.com/bootlin/elixir # docker build -t elixir -f ./elixir/docker/Dockerfile ./elixir/ You can then run the image using `docker run`. Here we mount a host directory as Elixir data: # mkdir ./elixir-data # docker run -v ./elixir-data/:/srv/elixir-data -d --name elixir-container elixir The Docker image does not contain any repositories. To index a repository, you can use the `index-repository` script. For example, to add the https://musl.libc.org/[musl] repository, run: # docker exec -it -e PYTHONUNBUFFERED=1 elixir-container \ /bin/bash -c 'export "PATH=/usr/local/elixir/venv/bin:$PATH" ; \ /usr/local/elixir/utils/index-repository \ musl https://git.musl-libc.org/git/musl' Without PYTHONUNBUFFERED environment variable, update logs may show up with a delay. Or, to run indexing in a separate container: # docker run -e PYTHONUNBUFFERED=1 -v ./elixir-data/:/srv/elixir-data \ --entrypoint /bin/bash elixir -c \ 'export "PATH=/usr/local/elixir/venv/bin:$PATH" ; \ /usr/local/elixir/utils/index-repository \ musl https://git.musl-libc.org/git/musl' You can also use utils/index-all-repositories to start indexing all officially supported repositories. After indexing is done, Elixir should be available under the following URL on your host: http://172.17.0.2/musl/latest/source If 172.17.0.2 does not answer, you can check the IP address of the container by running: # docker inspect elixir-container | grep IPAddress == Automatic repository updates The Docker image does not automatically update repositories by itself. You can, for example, start `utils/update-elixir-data` in the container (or in a separate container, with Elixir data volume/directory mounted) from cron on the host to periodically update repositories. == Using Docker image as a development server You can easily use the Docker image as a development server by following the steps above, but mounting Elixir source directory from the host into `/usr/local/elixir/` in the container when running `docker run elixir`. Changes in the code made on the host should be automatically reflected in the container. You can use `apache2ctl` to restart Apache. Error logs are available in `/var/log/apache2/error.log` within the container. = Hardware requirements Performance requirements depend mostly on the amount of traffic that you get on your Elixir service. However, a fast server also helps for the initial indexing of the projects. SSD storage is strongly recommended because of the frequent access to git repositories. At Bootlin, here are a few details about the server we're using: * As of July 2019, our Elixir service consumes 17 GB of data (supporting all projects), or for the Linux kernel alone (version 5.2 being the latest), 12 GB for indexing data, and 2 GB for the git repository. * We're using an LXD instance with 8 GB of RAM on a cloud server with 8 CPU cores running at 3.1 GHz. = Contributing to Elixir == Supporting a new project Elixir has a very simple modular architecture that allows to support new source code projects by just adding a new file to the Elixir sources. Elixir's assumptions: * Project sources have to be available in a git repository * All project releases are associated to a given git tag. Elixir only considers such tags. First make an installation of Elixir by following the above instructions. See the `projects` subdirectory for projects that are already supported. Once Elixir works for at least one project, it's time to clone the git repository for the project you want to support: cd /srv/git git clone --bare https://github.com/zephyrproject-rtos/zephyr After doing this, you may also reference and fetch remote branches for this project, for example corresponding to the `stable` tree for the Linux kernel (see the instructions for Linux earlier in this document). Now, in your `LXR_PROJ_DIR` directory, create a new directory for the new project: cd $LXR_PROJ_DIR mkdir -p zephyr/data ln -s /srv/git/zephyr.git repo export LXR_DATA_DIR=$LXR_PROJ_DIR/data export LXR_REPO_DIR=$LXR_PROJ_DIR/repo Now, go back to the Elixir sources and test that tags are correctly extracted: ./script.sh list-tags Depending on how you want to show the available versions on the Elixir pages, you may have to apply substitutions to each tag string, for example to add a `v` prefix if missing, for consistency with how other project versions are shown. You may also decide to ignore specific tags. All this can be done by redefining the default `list_tags()` function in a new `projects/<projectname>.sh` file. Here's an example (`projects/zephyr.sh` file): list_tags() { echo "$tags" | grep -v '^zephyr-v' } Note that `<project_name>` *must* match the name of the directory that you created under `LXR_PROJ_DIR`. The next step is to make sure that versions are classified as you wish in the version menu. This classification work is done through the `list_tags_h()` function which generates the output of the `./scripts.sh list-tags -h` command. Here's what you get for the Linux project: v4 v4.16 v4.16 v4 v4.16 v4.16-rc7 v4 v4.16 v4.16-rc6 v4 v4.16 v4.16-rc5 v4 v4.16 v4.16-rc4 v4 v4.16 v4.16-rc3 v4 v4.16 v4.16-rc2 v4 v4.16 v4.16-rc1 ... The first column is the top level menu entry for versions. The second one is the next level menu entry, and the third one is the actual version that can be selected by the menu. Note that this third entry must correspond to the exact name of the tag in git. If the default behavior is not what you want, you will have to customize the `list_tags_h` function. You should also make sure that Elixir properly identifies the most recent versions: ./script.sh get-latest If needed, customize the `get_latest()` function. If you want to enable support for `compatible` properties in Devicetree files, add `dts_comp_support=1` at the beginning of `projects/<projectname>.sh`. You are now ready to generate Elixir's database for your new project: ./update.py <number of threads> You can then check that Elixir works through your http server. == Coding style If you wish to contribute to Elixir's Python code, please follow the https://www.python.org/dev/peps/pep-0008/[official coding style for Python]. == How to send patches The best way to share your contributions with us is to https://github.com/bootlin/elixir/pulls[file a pull request on GitHub]. = Automated testing Elixir includes a simple test suite in `t/`. To run it, from the top-level Elixir directory, run: prove The test suite uses code extracted from Linux v5.4 in `t/tree`. == Licensing of code in `t/tree` The copied code is licensed as described in the https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/COPYING[COPYING] file included with Linux. All the files copied carry SPDX license identifiers of `GPL-2.0+` or `GPL-2.0-or-later`. Per https://www.gnu.org/licenses/gpl-faq.en.html#AllCompatibility[GNU's compatibility table], GPL 2.0+ code can be used under GPLv3 provided the combination is under GPLv3. Moreover, https://www.gnu.org/licenses/license-list.en.html#AGPLv3.0[GNU's overview of AGPLv3] indicates that its terms "effectively consist of the terms of GPLv3" plus the network-use paragraph. Therefore, the developers have a good-faith belief that licensing these files under AGPLv3 is authorized. (See also https://github.com/Freemius/wordpress-sdk/issues/166#issuecomment-310561976[this issue comment] for another example of a similar situation.) = License Elixir is copyright (c) 2017--2020 its contributors. It is licensed AGPLv3. See the `COPYING` file included with Elixir for details. |