Data collection for cryptocurrency using CryptoFeed, Arctic, kdb+, and AWS EC2.
As an avid researcher of bitcoin market microstructure, I discovered many methods for collecting and cleaning high-frequency data from bitcoin websites. In this piece, I'll explain why I believe this is by far the greatest way to collect and keep such data indefinitely. A variety of exchanges would collect asynchronous trading data in particular.
The disintegration of
Before we can learn how to use crypto data, we must first comprehend its features. One of its distinguishing features is fragmentation. If you've ever traded cryptocurrencies, you'll know that there are hundreds of exchanges that handle the majority of the traded volume. That means you'll have to listen to notifications from each of them in order to get all of the information that arrives from these locations. Asynchronous data, or non-deterministic arrival rates, is another term for fragmentation. CLOB and order routing are new notions on the crypto landscape due to a near complete lack of control.
Inadequacy of technology
If you were around in early December 2017, you would have seen that nearly every major crypto venue experienced engine problems as retail mania and institutional investors flooded vulnerable exchanges. However, as a simple Google search reveals, technical issues and maintenance delays are prevalent. That means our data collecting engine must be able to cope with such insecurity.
Enter CryptoFeed here.
Bryant Moscon created and maintains the CryptoFeed library. The asyncio and websockets packages in Python 3 are used to deliver a single feed from an arbitrary number of supported exchanges. In the case of exchange APIs, CryptoFeed has a solid re-connection logic in place for one reason or another. Clone the library and run this example first. It demonstrates how simple it is to combine streams from different exchanges into a single feed.
That was enjoyable and simple, however the data you obtained must be preserved in some way. In Man AHL's open-source Arctic database, CryptoFeed is a good example of data storage. Arctic encapsulates MongoDB in order to provide a beautiful UI for storing Python objects. I even had a great time working on Arctic during Man AHL's 2018 Hackathon in April!
Data from the Arctic and Frequently Ticking
The problem with utilising Arctic for higher frequency data is that every time you update, Arctic has to perform a slew of extra methods in order to store Mongo's Python object (usually a DataFrame pandas). If you don't batch results, this becomes impossible. Arctic must decompress objects before reading back information. In my experience, this causes a substantial pause in read/write times. There are, however, some really elegant solutions to this problem, one of which employs Kafka PubSub patterns to collect enough data before sending it to Arctic, but such patterns are beyond the scope of this paper. However, another storage method is required for asynchronous writing operations.
Enter kdb+.
Kdb+ is a columnar, in-memory database owned and managed by Kx Systemes that stores time-series data. Authur Whitney created kdb+, which is used to store, clean, and analyse massive amounts of time-series data in financial, telecom, and manufacturing applications (and even F1 teams). kdb+ is used in electronic trading for communications, data engineering, and data storage. The programming language q is kdb+. The technology is mostly proprietary, but the free 32-bit version can be used for academic study, which is what we want to do.
The primary method for obtaining kdb+ is through official download. Kdb+ was recently added to the Anaconda package repository, where it is simple to download and instal. The next step is to create a custom CryptoFeed callback that will pass incoming data to kdb+.
Information Transformation Service offers web scraping services to improve business outcomes and facilitate intelligent decision making. Their web scraping service allows you to scrape data from any website and transfer web pages into an easy-to-use format such as Excel, CSV, JSON, and many more.
cryptocurrency
Published:

cryptocurrency

Published: