mirror of
https://github.com/Instadapp/Swap-Aggregator-Subgraph.git
synced 2024-07-29 21:57:12 +00:00
93 lines
4.5 KiB
Markdown
93 lines
4.5 KiB
Markdown
|
|
This document describes some examples of where various features
|
|
of pull streams are used in simple real-world examples.
|
|
|
|
Much of the focus here is handling the error cases. Indeed,
|
|
distributed systems are _all about_ handling the error cases.
|
|
|
|
# A simple source that ends correctly. (read, end)
|
|
|
|
A normal file (source) is read, and sent to a sink stream
|
|
that computes some aggregation upon that input such as
|
|
the number of bytes, or number of occurances of the `\n`
|
|
character (i.e. the number of lines).
|
|
|
|
The source reads a chunk of the file at each time it's called,
|
|
there is some optimium size depending on your operating system,
|
|
file system, physical hardware,
|
|
and how many other files are being read concurrently.
|
|
|
|
When the sink gets a chunk, it iterates over the characters in it
|
|
counting the `\n` characters. When the source returns `end` to the
|
|
sink, the sink calls a user provided callback.
|
|
|
|
# A source that may fail. (read, err, end)
|
|
|
|
A file is downloaded over http and written to a file.
|
|
The network should always be considered to be unreliable,
|
|
and you must design your system to recover if the download
|
|
fails. (For example if the wifi were to cut out).
|
|
|
|
The read stream is just the http download, and the sink
|
|
writes it to a temporary file. If the source ends normally,
|
|
the temporary file is moved to the correct location.
|
|
If the source errors, the temporary file is deleted.
|
|
|
|
(You could also write the file to the correct location,
|
|
and delete it if it errors, but the temporary file method has the advantage
|
|
that if the computer or process crashes it leaves only a temporary file
|
|
and not a file that appears valid. Stray temporary files can be cleaned up
|
|
or resumed when the process restarts.)
|
|
|
|
# A sink that may fail
|
|
|
|
If we read a file from disk, and upload it, then the upload is the sink that may error.
|
|
The file system is probably faster than the upload and
|
|
so it will mostly be waiting for the sink to ask for more data.
|
|
Usually the sink calls `read(null, cb)` and the source retrives chunks of the file
|
|
until the file ends. If the sink errors, it then calls `read(true, cb)`
|
|
and the source closes the file descriptor and stops reading.
|
|
In this case the whole file is never loaded into memory.
|
|
|
|
# A sink that may fail out of turn.
|
|
|
|
A http client connects to a log server and tails a log in realtime.
|
|
(Another process will write to the log file,
|
|
but we don't need to worry about that.)
|
|
|
|
The source is the server's log stream, and the sink is the client.
|
|
First the source outputs the old data, this will always be a fast
|
|
response, because that data is already at hand. When the old data is all
|
|
written then the output rate may drop significantly because the server (the source) will
|
|
wait for new data to be added to the file. Therefore,
|
|
it becomes much more likely that the sink will error (for example if the network connection
|
|
drops) while the source is waiting for new data. Because of this,
|
|
it's necessary to be able to abort the stream reading (after you called
|
|
read, but before it called back). If it was not possible to abort
|
|
out of turn, you'd have to wait for the next read before you can abort
|
|
but, depending on the source of the stream, the next read may never come.
|
|
|
|
# A through stream that needs to abort.
|
|
|
|
Say we wish to read from a file (source), parse each line as JSON (through),
|
|
and then output to another file (sink).
|
|
If the parser encounters illegal JSON then it will error and,
|
|
if this parsing is a fatal error, then the parser needs to abort the pipeline
|
|
from the middle. Here the source reads normaly, but then the through fails.
|
|
When the through finds an invalid line, it should first abort the source,
|
|
and then callback to the sink with an error. This way,
|
|
by the time the sink receives the error, the entire stream has been cleaned up.
|
|
|
|
(You could abort the source and error back to the sink in parallel.
|
|
However, if something happened to the source while aborting, for the user
|
|
discover this error they would have to call the source again with another callback, as
|
|
situation would occur only rarely users would be inclined to not handle it leading to
|
|
the possiblity of undetected errors.
|
|
Therefore, as it is better to have one callback at the sink, wait until the source
|
|
has finished cleaning up before callingback to the pink with an error.)
|
|
|
|
In some cases you may want the stream to continue, and the the through stream can just ignore
|
|
an any lines that do not parse. An example where you definately
|
|
want a through stream to abort on invalid input would be an encrypted stream, which
|
|
should be broken into chunks that are encrypted separately.
|