Linux Audio

Check our new training course

Embedded Linux Audio

Check our new training course
with Creative Commons CC-BY-SA
lecture materials

Bootlin logo

Elixir Cross Referencer

Loading...
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
= The Elixir Cross Referencer
:doctype: book
:pp: {plus}{plus}
:toc:
:toc-placement!:

Elixir is a source code cross-referencer inspired by
https://en.wikipedia.org/wiki/LXR_Cross_Referencer[LXR]. It's written
in Python and its main purpose is to index every release of a C or C{pp}
project (like the Linux kernel) while keeping a minimal footprint.

It uses Git as a source-code file store and Berkeley DB for cross-reference
data. Internally, it indexes Git _blobs_ rather than trees of files to avoid
duplicating work and data. It has a straightforward data structure
(reminiscent of older LXR releases) to keep queries simple and fast.

You can see it in action on https://elixir.bootlin.com/

link:CHANGELOG.adoc[Changelog]

toc::[]

= Requirements

* Python >= 3.8
* Git >= 1.9
* The Jinja2 and Pygments (>= 2.7) Python libraries
* Berkeley DB (and its Python binding)
* Universal Ctags
* Perl (for non-greedy regexes and automated testing)
* Falcon and `mod_wsgi` (for the REST API)

= Architecture

The shell script (`script.sh`) is the lower layer and provides commands
to interact with Git and other Unix utilities. The Python commands use
the shell script's services to provide access to the annotated source
code and identifier lists (`query.py`) or to create and update the
databases (`update.py`). Finally, the web interface (`web.py`) and
uses the query interface to generate HTML pages and to answer REST
queries, respectively.

When installing the system, you should test each layer manually and make
sure it works correctly before moving on to the next one.

= Manual Installation

== Install Dependencies

____
For Debian
____

----
sudo apt install python3-pip python3-venv libdb-dev python3-dev build-essential universal-ctags perl git apache2 libapache2-mod-wsgi-py3 libjansson4
----

== Download Elixir Project

----
git clone https://github.com/bootlin/elixir.git /usr/local/elixir/
----

== Create a virtualenv for Elixir

---
python -m venv /usr/local/elixir/venv
. /usr/local/elixir/venv/bin/activate
pip install -r /usr/local/elixir/requirements.txt
---

== Create directories for project data

----
mkdir -p /path/elixir-data/linux/repo
mkdir -p /path/elixir-data/linux/data
----

== Set environment variables

Two environment variables are used to tell Elixir where to find the project's
local git repository and its databases:

* `LXR_REPO_DIR` (the git repository directory for your project)
* `LXR_DATA_DIR` (the database directory for your project)

Now open `/etc/profile` and append the following content.

----
export LXR_REPO_DIR=/path/elixir-data/linux/repo
export LXR_DATA_DIR=/path/elixir-data/linux/data
----

And then run `source /etc/profile`.

== Clone Kernel source code

First clone the master tree released by Linus Torvalds:

----
cd /path/elixir-data/linux
git clone --bare https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git repo
----

Then, you should also declare a `stable` remote branch corresponding to the `stable` tree, to get all release updates:

----
cd repo
git remote add stable git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
git fetch stable
----

Then, you can also declare an `history` remote branch corresponding to the old Linux versions not present in the other repos, to get all the old version still available:

----
cd repo
git remote add history https://github.com/bootlin/linux-history.git
git fetch history --tags
----

Feel free to add more remote branches in this way, as Elixir will consider tags from all remote branches.

== First Test

----
cd /usr/local/elixir/
./script.sh list-tags
----

== Create Database

----
. ./venv/bin/activate
./update.py <number of threads>
----

____
Generating the full database can take a long time: it takes about 15 hours on a Xeon E3-1245 v5 to index 1800 tags in the Linux kernel. For that reason, you may want to tweak the script (for example, by limiting the number of tags with a "head") in order to test the update and query commands. You can even create a new Git repository and just create one tag instead of using the official kernel repository which is very large.
____

== Second Test

Verify that the queries work:

 $ ./elixir/query.py v4.10 ident raw_spin_unlock_irq C
 $ ./elixir/query.py v4.10 file /kernel/sched/clock.c

NOTE: `v4.10` can be replaced with any other tag.
NOTE: Don't forget to activate the virtual environment!

== Configure httpd

The CGI interface (`web.py`) is meant to be called from your web
server. Since it includes support for indexing multiple projects,
it expects a different variable (`LXR_PROJ_DIR`) which points to a
directory with a specific structure:

* `<LXR_PROJ_DIR>`
 ** `<project 1>`
  *** `data`
  *** `repo`
 ** `<project 2>`
  *** `data`
  *** `repo`
 ** `<project 3>`
  *** `data`
  *** `repo`

It will then generate the other two variables upon calling the query
command.

Now replace `/etc/apache2/sites-enabled/000-default.conf` with `docker/000-default.conf`.
Note: If using httpd (RedHat/Centos) instead of apache2 (Ubuntu/Debian),
the default config file to edit is `/etc/httpd/conf.d/elixir.conf`.

Finally, start the httpd server.

----
systemctl restart apache2
----


== Configure SELinux policy

When running systemd with SELinux enabled, httpd server can only visit limited directories.
If your /path/elixir-data/ is not one of these allowed directories, you will be responded with 500 status code.

To allow httpd server to visit /path/elixir-data/, run following codes:
----
chcon -R -t httpd_sys_rw_content_t /path/elixir-data/
----

To check if it takes effect, run the following codes:
----
ls -Z /path/elixir-data/
----

In case you want to check SELinux log related with httpd, run the following codes:
----
audit2why -a | grep httpd | less
----

== Configure systemd log directory

By default, the error log of elixir will be put in /tmp/elixir-errors.
However, systemd enables PrivateTmp by default.
And, the final error directory will be like /tmp/systemd-private-xxxxx-httpd.service-xxxx/tmp/elixir-errors.
If you want to disable it, configure httpd.service with the following attribute:
----
PrivateTmp=false
----

== Configuration for other servers

Other HTTP servers (like nginx or lighthttpd) may not support WSGI and may require a separate WSGI server, like uWSGI.

Information about how to configure uWSGI with Lighthttpd can be found here:
https://redmine.lighttpd.net/projects/lighttpd/wiki/HowToPythonWSGI#Python-WSGI-apps-via-uwsgi-SCGI-FastCGI-or-HTTP-using-the-uWSGI-server

Pull requests with example uWSGI configuration for Elixir are welcome.

= REST API usage

After configuring httpd, you can test the API usage:

== ident query

Send a get request to `/api/ident/<Project>/<Ident>?version=<version>&family=<family>`.
For example:

 curl http://127.0.0.1/api/ident/barebox/cdev?version=latest&family=C

The response body is of the following structure:

----
{
    "definitions":
        [{"path": "commands/loadb.c", "line": 71, "type": "variable"}, ...],
    "references":
        [{"path": "arch/arm/boards/cm-fx6/board.c", "line": "64,64,71,72,75", "type": null}, ...]
}
----

= Maintenance and enhancements

== Using a cache to improve performance

At Bootlin, we're using the https://varnish-cache.org/[Varnish http cache]
as a front-end to reduce the load on the server running the Elixir code.

 .-------------.           .---------------.           .-----------------------.
 | Http client | --------> | Varnish cache | --------> | Apache running Elixir |
 '-------------'           '---------------'           '-----------------------'

== Keeping Elixir databases up to date

To keep your Elixir databases up to date and index new versions that are released,
we're proposing to use a script like `utils/update-elixir-data` which is called
through a daily cron job.

You can set `$ELIXIR_THREADS` if you want to change the number of threads used by
update.py for indexing (by default the number of CPUs on your system).

== Keeping git repository disk usage under control

As you keep updating your git repositories, you may notice that some can become
considerably bigger than they originally were. This seems to happen when a `gc.log`
file appears in a big repository, apparently causing git's garbage collector (`git gc`)
to fail, and therefore causing the repository to consume disk space at a fast
pace every time new objects are fetched.

When this happens, you can save disk space by packing git directories as follows:

----
cd <bare-repo>
git prune
rm gc.log
git gc --aggressive
----

Actually, a second pass with the above commands will save even more space.

To process multiple git repositories in a loop, you may use the
`utils/pack-repositories` that we are providing, run from the directory
where all repositories are found.

= Building Docker images

Dockerfiles are provided in the `docker/` directory.
To build the image, run the following commands:

 # git clone https://github.com/bootlin/elixir
 # docker build -t elixir -f ./elixir/docker/Dockerfile ./elixir/

You can then run the image using `docker run`.
Here we mount a host directory as Elixir data:

 # mkdir ./elixir-data
 # docker run -v ./elixir-data/:/srv/elixir-data -d --name elixir-container elixir

The Docker image does not contain any repositories.
To index a repository, you can use the `index-repository` script.
For example, to add the https://musl.libc.org/[musl] repository, run:

 # docker exec -it -e PYTHONUNBUFFERED=1 elixir-container \
    /bin/bash -c 'export "PATH=/usr/local/elixir/venv/bin:$PATH" ; \
    /usr/local/elixir/utils/index-repository \
    musl https://git.musl-libc.org/git/musl'

Without PYTHONUNBUFFERED environment variable, update logs may show up with a delay.

Or, to run indexing in a separate container:

 # docker run -e PYTHONUNBUFFERED=1 -v ./elixir-data/:/srv/elixir-data \
    --entrypoint /bin/bash elixir -c \
    'export "PATH=/usr/local/elixir/venv/bin:$PATH" ; \
    /usr/local/elixir/utils/index-repository \
    musl https://git.musl-libc.org/git/musl'

You can also use utils/index-all-repositories to start indexing all officially supported repositories.

After indexing is done, Elixir should be available under the following URL on your host:
http://172.17.0.2/musl/latest/source

If 172.17.0.2 does not answer, you can check the IP address of the container by running:

 # docker inspect elixir-container | grep IPAddress

== Automatic repository updates

The Docker image does not automatically update repositories by itself.
You can, for example, start `utils/update-elixir-data` in the container (or in a separate container, with Elixir data volume/directory mounted)
from cron on the host to periodically update repositories.

== Using Docker image as a development server

You can easily use the Docker image as a development server by following the steps above, but mounting Elixir source directory from the host
into `/usr/local/elixir/` in the container when running `docker run elixir`.

Changes in the code made on the host should be automatically reflected in the container.
You can use `apache2ctl` to restart Apache.
Error logs are available in `/var/log/apache2/error.log` within the container.

= Hardware requirements

Performance requirements depend mostly on the amount of traffic that you get
on your Elixir service. However, a fast server also helps for the initial
indexing of the projects.

SSD storage is strongly recommended because of the frequent access to
git repositories.

At Bootlin, here are a few details about the server we're using:

* As of July 2019, our Elixir service consumes 17 GB of data (supporting all projects),
or for the Linux kernel alone (version 5.2 being the latest), 12 GB for indexing data,
and 2 GB for the git repository.
* We're using an LXD instance with 8 GB of RAM on a cloud server with 8 CPU cores
running at 3.1 GHz.

= Contributing to Elixir

== Supporting a new project

Elixir has a very simple modular architecture that allows to support
new source code projects by just adding a new file to the Elixir sources.

Elixir's assumptions:

* Project sources have to be available in a git repository
* All project releases are associated to a given git tag. Elixir
only considers such tags.

First make an installation of Elixir by following the above instructions.
See the `projects` subdirectory for projects that are already supported.

Once Elixir works for at least one project, it's time to clone the git
repository for the project you want to support:

 cd /srv/git
 git clone --bare https://github.com/zephyrproject-rtos/zephyr

After doing this, you may also reference and fetch remote branches for this project,
for example corresponding to the `stable` tree for the Linux kernel (see the
instructions for Linux earlier in this document).

Now, in your `LXR_PROJ_DIR` directory, create a new directory for the
new project:

 cd $LXR_PROJ_DIR
 mkdir -p zephyr/data
 ln -s /srv/git/zephyr.git repo
 export LXR_DATA_DIR=$LXR_PROJ_DIR/data
 export LXR_REPO_DIR=$LXR_PROJ_DIR/repo

Now, go back to the Elixir sources and test that tags are correctly
extracted:

 ./script.sh list-tags

Depending on how you want to show the available versions on the Elixir pages,
you may have to apply substitutions to each tag string, for example to add
a `v` prefix if missing, for consistency with how other project versions are
shown. You may also decide to ignore specific tags. All this can be done
by redefining the default `list_tags()` function in a new `projects/<projectname>.sh`
file. Here's an example (`projects/zephyr.sh` file):

 list_tags()
 {
     echo "$tags" |
     grep -v '^zephyr-v'
 }

Note that `<project_name>` *must* match the name of the directory that
you created under `LXR_PROJ_DIR`.

The next step is to make sure that versions are classified as you wish
in the version menu. This classification work is done through the
`list_tags_h()` function which generates the output of the `./scripts.sh list-tags -h`
command. Here's what you get for the Linux project:

 v4 v4.16 v4.16
 v4 v4.16 v4.16-rc7
 v4 v4.16 v4.16-rc6
 v4 v4.16 v4.16-rc5
 v4 v4.16 v4.16-rc4
 v4 v4.16 v4.16-rc3
 v4 v4.16 v4.16-rc2
 v4 v4.16 v4.16-rc1
 ...

The first column is the top level menu entry for versions.
The second one is the next level menu entry, and
the third one is the actual version that can be selected by the menu.
Note that this third entry must correspond to the exact
name of the tag in git.

If the default behavior is not what you want, you will have
to customize the `list_tags_h` function.

You should also make sure that Elixir properly identifies
the most recent versions:

 ./script.sh get-latest

If needed, customize the `get_latest()` function.

If you want to enable support for `compatible` properties in Devicetree files,
add `dts_comp_support=1` at the beginning of `projects/<projectname>.sh`.

You are now ready to generate Elixir's database for your
new project:

 ./update.py <number of threads>

You can then check that Elixir works through your http server.

== Coding style

If you wish to contribute to Elixir's Python code, please
follow the https://www.python.org/dev/peps/pep-0008/[official coding style for Python].

== How to send patches

The best way to share your contributions with us is to https://github.com/bootlin/elixir/pulls[file a pull
request on GitHub].

= Automated testing

Elixir includes a simple test suite in `t/`.  To run it,
from the top-level Elixir directory, run:

 prove

The test suite uses code extracted from Linux v5.4 in `t/tree`.

== Licensing of code in `t/tree`

The copied code is licensed as described in the https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/COPYING[COPYING] file included with
Linux.  All the files copied carry SPDX license identifiers of `GPL-2.0+` or
`GPL-2.0-or-later`.  Per https://www.gnu.org/licenses/gpl-faq.en.html#AllCompatibility[GNU's compatibility table], GPL 2.0+ code can be used
under GPLv3 provided the combination is under GPLv3.  Moreover, https://www.gnu.org/licenses/license-list.en.html#AGPLv3.0[GNU's overview
of AGPLv3] indicates that its terms "effectively consist of the terms of GPLv3"
plus the network-use paragraph.  Therefore, the developers have a good-faith
belief that licensing these files under AGPLv3 is authorized.  (See also https://github.com/Freemius/wordpress-sdk/issues/166#issuecomment-310561976[this
issue comment] for another example of a similar situation.)

= License

Elixir is copyright (c) 2017--2020 its contributors.  It is licensed AGPLv3.
See the `COPYING` file included with Elixir for details.