Name Date Size #Lines LOC

..03-Dec-2018-

.eclipse/H17-Jan-2018-

closed/H17-Jan-2018-

exception_lists/H08-Nov-2018-

tools/H17-Jan-2018-

usr/src/H17-Jan-2018-

.cproject-templateH A D17-Jan-201820.8 KiB401399

.delphixrcH A D17-Jan-2018217 119

.hooksconfigH A D08-Nov-201812.9 KiB387382

.project-templateH A D17-Jan-20182.2 KiB7978

.reviewboardrcH A D17-Jan-201887 64

README.mdH A D17-Jan-201818.9 KiB244162

build_setup.shH A D17-Jan-20181.2 KiB5628

eclipse_setup.shH A D17-Jan-2018362 159

eclipse_zfs_make.shH A D17-Jan-2018536 231

fix_closed_bins.shH A D17-Jan-20183.6 KiB13294

git_hook_setup.shH A D17-Jan-20181 KiB3914

workspace_setup.shH A D17-Jan-2018620 232

README.md

1<!---
2CDDL HEADER START
3
4This file and its contents are supplied under the terms of the
5Common Development and Distribution License ("CDDL"), version 1.0.
6You may only use this file in accordance with the terms of version
71.0 of the CDDL.
8
9A full copy of the text of the CDDL should have accompanied this
10source.  A copy of the CDDL is also available via the Internet at
11http://www.illumos.org/license/CDDL.
12
13CDDL HEADER END
14
15Copyright (c) 2017 by Delphix. All rights reserved.
16-->
17
18Delphix OS
19==========
20
21This is the Delphix operating system, and it is maintained by the Systems Platform team. This document describes the tools and processes Delphix employees use to modify this codebase.
22
23Development
24-----------
25
26To get started developing Delphix OS, you should create a local clone of the `git-utils` repository and add the `bin` subdirectory to your shell's `PATH`. This will give you access to our automated build tools:
27
28- `git zfs-make`: Running this from your OS repository will give you a quick build of some commonly-modified parts of the OS (including, but not limited to, ZFS). It usually takes about 15 minutes the first time you run it on a new branch, and then about 2 minutes for subsequent builds. You can also use it to run our various linters / checkers with the `-l`, `-n`, and `-p` options.
29- `git zfs-load`: This takes build artifacts created by a previous `git zfs-make` and loads them into a VM that you specify. This will also reboot the machine you load the bits onto so that kernel modifications will take effect (unless you specify `-n`).
30- `git build-os`: This command runs a full, clean build of the OS through our Jenkins automation server and writes the output to a location of your choosing, specified through the `--archive-host` and `--archive-dir` parameters. This generally takes around 45 minutes to run. You will need your own `jenkins`-accessible directory on a build server to dump the resulting files into, which you can create by running `/usr/local/bin/create_data_dir $USER` on the most recent build server and then making a subdirectory which is `chown`ed to the `jenkins` user.
31
32The general rule for C style is to follow the style of whatever code you're modifying and to use C99 features whenever possible, but ideally all code we touch would adhere to the *C Style and Coding Standards for SunOS*, which is stored at `tools/cstyle.pdf` in this repository. Try to follow that unless the code you're working in uses drastically different style or you have a good reason not to (like this code was ported from another OS and you want to reduce future merge pain by keeping the style the same as the original).
33
34We also have a few automated testing tools:
35
36- `git zfs-test`: If you have a VM that's been set up using `git zfs-load` and you want to kick off an automated test run, this will start a job asynchronously on our Jenkins server. You can use it for both `zloop` and `zfstest` runs (see **Testing** below for more details on these).
37- `git zfs-perf-test`: This runs the performance suite from `zfstest` on dedicated hardware to help you get a reliable measure of how your changes impact ZFS performance. These tests generate artificial load through `fio` so none of them are a substitute for real-world testing, but they are a good set of benchmarks to start with.
38- `git zfs-precommit`: This is a way to kick off the equivalent of a `git build-os` followed by a `git zfs-test`. For most bugs you should have a passing version of this before you post your code for review (unless you explicitly label the review as a work in progress).
39
40All of the commands above have `-h` options that you can pass to them to learn more about their capabilities.
41
42Here are some additional resources for learning about the implementation of a specific subsystem:
43
44- Many ZFS features are documented through links on the [OpenZFS website](http://www.open-zfs.org/wiki/Developer_resources)
45- For device drivers check out the illumos [Writing Device Drivers](https://illumos.org/books/wdd/) guide
46- Many other subsystems are described in detail in the [Solaris Internals](https://www.amazon.com/Solaris-Internals-OpenSolaris-Architecture-paperback/dp/0134185978) book
47
48Testing
49-------
50
51### `zfstest`
52
53The `zfstest` suite is a set of `ksh` scripts that were created to exercise as much user facing functionality in ZFS as possible. As such, there are *many* test cases, and it takes several hours to run all of them. The set of tests that will be run is specified by a runfile, and the one used by all of our tools is `usr/src/test/zfs-tests/runfiles/delphix.run`. You can create your own runfiles and run `zfstest` on them manually using the command `/opt/zfs-tests/bin/zfstest -a -c custom.run`.
54
55There are a few known failures in the default set of tests, and we keep track of these in our bug tracker by filing a bug for the problem and adding "labels" with the names of the tests that fail as a result of the bug. If you run `zfstest` through Jenkins, the Jenkins job will look at the set of failed tests and only mark the run as failed if there are some unknown ones (that your code probably introduced). Occasionally you may need to search the bug tracker manually for a failed test to see if a bug was recently closed and you haven't pulled in the fix yet, etc.
56
57### `ztest`
58
59`ztest` is a userland program that runs many `zfs`- and `zpool`-like operations randomly and in parallel, providing a kind of stress test. It compiles against a userland library called `libzpool` that includes most of ZFS's kernel implementation, and it uses `libfakekernel` to replace the parts of the kernel that ZFS depends on.
60
61There are several big reasons to use (and add new tests to) `ztest`:
62
63- if you're modifying a really important part of the system, you can use `ztest` to test your changes without fear of making your VM inaccessible
64- you can access functionality which isn't available to the `zfstest` suite because you're calling into the implementation directly
65- `ztest` exercises data races and other interactions between features in a way that is hard to mimic manually, or through shell commands in a `zfstest` test
66
67However, it has a couple of limitations, too:
68
69- dependencies outside of ZFS are mocked out, so you don't get their real behaviors
70- the ZFS POSIX Layer (ZPL) currently isn't included in `ztest`
71- the logic for `zfs send` and `zfs receive` currently isn't included in `ztest`
72
73#### `zloop`
74
75`zloop` is a convenient wrapper around `ztest` that allows you to run it for a certain amount of time and switch between pool configurations every few minutes. It also will continue running even if an individual `ztest` run fails, which is nice for kicking a lot of testing off at once that you can revisit the results of in bulk later on.
76
77Integration Process
78-------------------
79
80When your code is passing tests, you can use the `git review` tool to post the diff to our peer review tool. It will not be publicly visible until you hit the "Publish" button. Before you do so, there are a couple of requirements:
81
82- add a good description of the bug, your fix for it, and any notes to make reviewing your code easier
83- add the relevant reviewers (specific people, or larger groups like "zfs")
84- add a link to a passing `git zfs-precommit` Jenkins job in the "Testing" section (or a failing one along with an explanation of why the failures are unrelated to your changes)
85
86When you're ready, publish the review. If you don't get any comments within a couple of days, it may be worth pinging your reviewers directly so they know you need their help. You will need at least two reviewers to push, and one of those must be a gatekeeper.
87
88When you have all your reviews, you're ready to push. We have some custom git hooks that will make sure your commit message and code changes follow our internal guidelines, and that the bug you're fixing has all the necessary fields filled out. If there's an issue, `git push` will emit a message describing the problem. If you're getting a warning that you would like to ignore, you can run `git push` a second time to bypass it, but errors must be resolved before pushing.
89
90### Backporting
91
92When a bug affects previous releases of the Delphix Engine, we may want to apply the same fix to a previous release. This involves TODO
93
94Open Source
95-----------
96
97Delphix OS is open source, but to make our code more useful to the community we push nearly all our bug fixes and (at a slightly slower cadence) features to the repository we're based on, [illumos](https://github.com/illumos/illumos-gate). We use [this document](https://docs.google.com/document/d/1fUIOtDvVvyA87L8QaaCMpAQaC_a5P51Sbs-5906Pv_Y) to track our upstreaming activity. The things we don't upstream are:
98
99- features that haven't shipped yet (to ensure their stability)
100- *[rare]* features that aren't general enough to be useful outside of Delphix
101- *[rare]* integrations with infrastructure that's only available inside Delphix
102- *[rare]* bug fixes which are too hacky or Delphix-specific to upstream
103
104We also pull in changes from upstream, and we call this activity "doing an illumos sync". Although we are some of the primary contributers to ZFS and the paravirtual device drivers in illumos, other members of the community provide additional features and bug fixes to many other parts of the system that we rely on, so this is a very valuable activity for us. To stay informed about what's going on in the broader illumos community, you can sign up for the [developer mailing list](http://wiki.illumos.org/display/illumos/illumos+Mailing+Lists).
105
106Debugging
107---------
108
109The OS has its own set of debugging tools. We don't have a source-level debugger like `gdb`, so if that's what you're used to there'll be a slight learning curve.
110
111### DTrace
112
113DTrace is a framework that allows you to run simple callbacks when various events happen, which allows you to programmatically debug what a program or the kernel is doing. The most useful events tend to be standard probe points that have been purposely placed into the kernel, the entry or exit of kernel functions, and a timer event that fires *N* times a second. There are many good resources on DTrace:
114
115- [DTrace Bootcamp](http://dtrace.org/resources/ahl/dtrace_course.2005.8.18.pdf) by Adam Leventhal
116- [DTrace by Example](http://www.oracle.com/technetwork/server-storage/solaris/dtrace-tutorial-142317.html) by Ricky Weisner
117- [Summary page with 1-liners](http://www.brendangregg.com/dtrace.html) by Brendan Gregg
118- The [illumos DTrace guide](http://dtrace.org/guide/chp-intro.html)
119- [Delphix enhancements to DTrace](https://docs.google.com/presentation/d/1sRJTlZD6wt937Nn2wWy0010BbvzJ3WOm4tIVLOgrrnM) by Matt Ahrens
120- Many, many undocumented ZFS-related scripts in Matt's home directory
121
122Because all the interesting stuff in `ztest` happens in subprocesses, running `dtrace` against it can be tricky. You should copy the format of `~mahrens/follow.d` to invoke your own `child.d` script on every subprocess that gets spawned from the main `ztest` instance.
123
124### `mdb` and `kmdb`
125
126The program `mdb` allows you to debug a userland core file or kernel crash dump after a failure. Core and dump files are configured using `coreadm` / `dumpadm` to be written to `/var/crash/`. To debug a core file you can run MDB directly on the file like `mdb /var/crash/core.xyz.1234`, and for crash dumps you first expand them using `sudo savecore -vf vmdump.0` and then run `mdb 0` to start debugging. `mdb -p` and `mdb -k` allow you to connect to a running process's / kernel's address space in a similar way.
127
128`kmdb` is similar to `mdb -k`, but it additionally allows you to do live kernel debugging with breakpoints. To use `kmdb`, you must enable it with one of these methods:
129
130- Hit space / escape during boot when the illumos boot menu is displayed on the console, navigate to the `Configure Boot Options` screen, and turn the `kmdb` option on. (Note that you may have to use Ctrl-H instead of backspace to get back to the main menu to continue booting. Also, this method does not persist across reboots.)
131- Create a file in `/boot/conf.d/` which has the line `boot_kmdb=YES`.
132
133After you've enabled `kmdb` and rebooted, you can drop into the debugger in a few different ways:
134
135- To drop in as soon as possible during boot (to debug a boot failure, etc.) turn the bootloader option for `Debug` on, or add another line to the bootloader's config file with `boot_debug=YES`.
136- To drop in at a random time of your choosing, run `sudo mdb -K` from the console after the machine is booted.
137- To drop in during a non-maskable interrupt (especially useful for debugging hangs), add `set apic_kmdb_on_nmi=1` to your `/etc/system` file (and remove the line which sets `apic_panic_on_nmi` if it's present) and reboot. You can generate NMIs on demand through whatever hypervisor you're using.
138
139You can only issue `kmdb` commands via the console (i.e. not over SSH), so make sure you have access to the console before you try to drop into it!
140
141There are many good guides introducing you to the basics of MDB commands:
142
143- [The Modular Debugger](http://www.solarisinternals.com/si/reading/chpt_mdb_os.pdf), a chapter from *Solaris Internals*
144- [Diagnosing kernel hangs/panics with kmdb and moddebug](https://blogs.oracle.com/dmick/entry/diagnosing_kernel_hangs_panics_with) by Dan Mick
145- [Solaris Core Analysis, Part 1: mdb](http://cuddletech.com/?p=436) by Ben Rockwood
146- [An MDB reference](http://www.solarisinternals.com/si/tools/mdb/index.php) by Jonathan Adams
147- GDB to MDB, [Part 1](https://blogs.oracle.com/eschrock/entry/gdb_to_mdb) and [Part 2](https://blogs.oracle.com/eschrock/entry/gdb_to_mdb_migration_part) by Eric Schrock
148
149A lot of the most useful parts of MDB are commands (known as `dcmd`s in MDB-speak) that have been written in concert with some kernel feature to help you visualize what's going on. For instance, there are many custom `dcmd`s you can use to look at the in memory state of ZFS, the set of interrupts that are available to you, etc. MDB also uses the idea of pipelines (like from `bash`), so you can pipe the results of one command to another. There's also a special kind of command called a "walker" that can generate many outputs, and these are frequently used as inputs to pipes. For a mildly contrived example, if you want to pretty-print every `arc_buf_t` object in the kernel's memory space, you could do `::walk arc_buf_t | ::print arc_buf_t`. You can even pipe out to actual shell commands using `!` instead of `|` as the pipeline character if you want to use `grep` or `awk` to do a little postprocessing of your data (although this is not available from `kmdb`).
150
151Finally, to enable some extremely useful `dcmd`s, turn on `kmem` debugging by putting `set kmem_flags=0xf` in `/etc/system` and then rebooting. After doing this, the command `::whatis` can tell you the stack trace where every buffer was allocated or freed, and `::findleaks` can be used to search the heap for memory leaks.
152
153Here are some of the most commonly useful `mdb` commands:
154```
155# description of how this core file was created
156::status
157# print assertion failures, system logs, etc.
158::msgbuf
159# print panicstr (last resort if ::status / ::msgbuf don't work)
160*panicstr/s
161# backtrace of current thread
162::stack
163# print register state
164::regs
165# print contents of register rax (can also do "<rax::print <type>")
166<rax=Z
167# allocation backtrace for a buffer located at 0x1234
1680x1234::whatis
169```
170
171Here are a few to help you learn about new features of MDB:
172```
173# show the formats you can use with the "<addr>/<format>" syntax
174::formats
175# print out all dcmds with descriptions
176::dcmds
177# print out all walkers
178::walkers
179# get information about a dcmd "::foo"
180::help foo
181```
182
183Here are some OS-specific `dcmd`s:
184```
185# view interrupt table
186::interrupts
187# similar to "prtconf", prints device tree
188::prtconf
189# similar to "zpool status"
190::spa -v
191# prints out all stacks with >= 1 frame in the zfs kernel module
192::stacks -m zfs
193# prints human-readable block pointer located at memory address 0x1234
1940x1234::blkptr
195# see status of all active zios
196::zio_state -r
197# see the debug messages ZFS has logged recently
198::zfs_dbgmsg
199# look at SCSI state
200::walk sd_state | ::sd_state
201```
202
203Here are some commands we use commonly to debug kernel and user out-of-memory issues:
204```
205# look at Java threads in kernel dump
206::pgrep java | ::walk thread | ::findstack -v
207# get Java gcore from kernel dump (although often it is incomplete)
208::pgrep java | ::gcore
209# high-level view of kernel memory use
210::memstat
211# allocation statistics for kmem / umem (depending on if we're in kernel or not)
212::kmastat / ::umastat
213# allocation backtraces for largest kmem / umem allocations
214::kmausers / ::umausers
215# sum up anonymous memory used by all Java processes
216::pgrep java | ::pmap -q ! grep anon | awk 'BEGIN{sum=0} {sum+=$3} END{print(sum)}'
217```
218
219Here are the miscellaneous useful ones you may want one day:
220```
221# expand terminal width to 1024 columns
2221024 $w
223# ignore all asserts from here onwards
224aok/w 1
225```
226
227### `zdb`
228
229`zdb` is a ZFS-specific tool which is useful for inspecting logical on-disk structures. When you add a new on-disk feature to ZFS, you should augment `zdb` to print the format intelligently to help you debug issues later on. `zdb` can also be used to verify the on-disk format of a pool, and it is used from `ztest` for this reason at the end of each run.
230
231### `zinject`
232
233`zinject` is another ZFS-specific tool which is useful for simulating slow or failing disks. You can use it to write corruptions into specific objects or device labels, add artificial I/O latency for performance testing, or even pause all I/Os.
234
235### Early boot debugging
236
237First, if the thing you're working on is part of ZFS, you may be able to test it using `ztest` (see above) so that you don't have to deal with early boot debugging.
238
239If that's not possible and you're working on something that you know is likely to cause boot problems when you mess up, it's a good idea to preemptively add the lines `boot_{kmdb,verbose,debug}=YES` to a file in `/boot/conf.d/` and `set apic_kmdb_on_nmi=1` in `/etc/system` as described above. This will allow you to set breakpoints immediately at boot, and will drop into `kmdb` if there is a panic or an NMI. For more information about how to configure the bootloader, see [this guide](https://docs.google.com/document/d/1FxZKpymWf5EnR9eohH-B3EesvyNZUIkzu9Yo9sKWm40).
240
241A common hurdle when setting `kmdb` breakpoints during boot is that the module you're trying to debug hasn't been loaded yet, so doing `function_to_debug:b` doesn't work. The easiest way to work around this is to scope your breakpoint location by doing ``::bp module_name`function_to_debug``, so that `kmdb` will automatically set the breakpoint when the module is loaded (assuming you spelled the module and function name correctly).
242
243If you forget to set up `kmdb` and your kernel panics during (or shortly after) boot three times in a row, you will be rebooted into the Delphix OS recovery environment. This is a stripped-down version of Delphix OS which allows you to SSH into the VM as `root`, run diagnostics, and (maybe) fix things up. You can find more information about the recovery environment [here](https://docs.google.com/document/d/1J_JQTHirXaXdzGBoaIDjQX9EFzBByhMpR3Ufx88w3hM).
244