Documentation

Table of Contents

Data Captured

The data attempts to collect all public companies trading on public exchanges (NASDAQ, NYSE, AMEX), and tries to exclude undesirable entities such as those that are purely investment funds (ex: “Barclays Bond Fund II”), pre-merged SPACs and other not-quite-a-company entities. This helps present our aggregated metrics as a better representation to US corporate business performance. Additionally, companies that do not release financial data in the form of 10Ks and 10Qs (i.e. non-US filers) are not used in FinAgg.

If a company is delisted from exchanges, the data is NOT retroactively removed from our database. This is done with the belief that this helps prevent survivorship bias when viewing our data from a high-level economic perspective (a bias that overhangs traditional indexes like the Dow Jones Industrial Index or S&P 500).

The financial filing data is updated every Monday-Friday weeknight and is reflected for users before midnight (EST) the same day. The data gathering process utilizes freely available data from the SEC EDGAR filings and exchange tickers.

At this time, the only information utilized by FinAgg are company Income Statements, Cash Flow Statements, Balance Sheets, Shares Outstanding, Sector, Price Data, and listing status.

Data Format

Dates

The data presented by finagg is presented on a last-twelve-month (“LTM”) rolling basis. This helps present data more smoothly and captures a full year's seasonality at all times. Financial ratios and growth metrics are year-over-year (“YoY”) LTM-based.

All periods are presented as calendar year, not financial year. So if a company has their financial year end in June, FinAgg will present this as Q2 not Q4. This is done to establish consistency so users can compare company performance across the same calendar periods.

However, this adjustment introduces some complication. It begs the question “Ok, but what thresholds do you use to discriminate calendar periods?” The answer is FinAgg tries to classify a financial period to a calendar period that captures the bulk of that financial activity. Consider the following image:


An illustration showing how fiscal period-end gets bucketed into calendar period-end


So to be more explicit:

The "Current" period

FinAgg provides use of a “Current” period which is the latest data. The Current period is defined as the latest filing data from the last completed calendar quarter, or if not available then from the previous quarter (i.e. late filers). If FinAgg does not have data on a company within the last two calendar quarters, then there is no data available for the current period. Also noteworthy, any metrics using share price or market capitalization (ex: P/E, EV, etc) use the latest trading day's close price instead of a period average.

Other complicated accounting situations

FinAgg is aware that financial reporting can get very complicated. Examples include company's changing their financial period-end dates, successor-predecessor accounting, or amended financial filings. This can cause all sorts of disruptions to continuity. Rest assured FinAgg takes sophisticated approaches to these types of scenarios!

Lineitem Inferencing

Financial statements can be unusually organized at times (the income statement being notorious in particular). Sometimes very common lineitems are not specified, and consequently, these line items will cause that measure to be left blank. However, sometimes it is possible to "figure out" such values out even if it was not explicitly stated. Rather than leave a value as missing, FinAgg will try to infer these values if possible. For example, sometimes companies do not explicitly state Operating Income, or sometimes pre-revenue companies do not state Revenue, but it can be calculated using the other lineitems available around it. At this time, only Sales and Operating Income are inferred.

Financial Metrics

In addition to the basic lineitem measures taken from the Income Statement, Cash Flow Statement, and Balance Sheets, FinAgg makes all the common financial metrics and ratios that utilize said lineitems from those statements. Below are some core definitions (not an exhaustive list):

Special note: If selecting "Current" period, then metrics using share price data use the latest closing price for calculation. Additionally, if a measure uses count of shares outstanding, then this uses what the company discloses in their filings, not the exchanges. Companies undergoing stock splits/consolidations or large sudden issuances/buybacks may appear distorted until period end when they report the updated shares in their filings

Sector classification

Public companies identify themselves by SIC codes on the cover of their financial filings. This is a design scheme produced by the Bureau of Labor Statistics. It is hierarchy based with 4 levels with increasing detail (ex: think like Retail -> Apparel -> Women's clothing -> Women's Shoes).

FinAgg remaps these labels to more desirable categories. The reason being is that some of the SIC's “Divisions” at the first level can be overly broad (ex: “Finance, Insurance, and Real Estate”), while the second level's “Major Groups” can be too granular (ex: “Pipelines, except natural gas”).

FinAgg uses the following remapping scheme:


Details on how FinAgg reassigns company sectors



Data Integrity

The data collection process of external sites means FinAgg is inherently reliant on these sources for website availability, data integrity, and timeliness of updates. While the data very rarely has issues in these regards (i.e. far less than 1% of the total data ingested), sometimes issues do occur. Here are some examples FinAgg has experienced:

FinAgg makes a conscious effort to try to detect anomalies in the data it ingests, investigate the root cause, develop algorithms to correct, and be a nice responsible citizen by notifying the distributors of these issues. However, FinAgg's correction algorithms are probability-based (i.e. while they have been rigorously tested, they are never perfect) and new issues can always emerge and remain present in the data until FinAgg addresses them.

While these issues are not ideal (and could be fixed by an army of data entry analysts), FinAgg's approach to using easily available-data and algorithmic processing is what helps keep this service relatively labor light and free to users.

Filtering & Charting

After a users applies filter to the underlying data they plan to visualize, an aggregation process follows which performs statistics such as sums, medians or averages over all companies (ex: “What are the total sum of sales for all companies across time?”). FinAgg also performs these same aggregations by the individual sectors as well. Since these aggregations are rollups of the data that was filtered by the user, it is important to not misinterpret them as representing the full population. So for example, if the user filters out Manufacturing companies, then the aggregated sum of sales available for visualization would NOT be including manufacturing companies.

Chart Types

Depending on the parameters selected by the user, a chart will generated based on the best way to view that data. It depends on 3 decisions: Do you want to see a snapshot in time or a timeseries? How many variables you are trying to visualize? What are the types of variables you are trying to visualize (numeric vs. categorical)? Below is a guide.


An illustration showing how FinAgg generates different charts depending on the variable(s) specified


Note that any companies that do not have data for the measures you are visualizing will be entirely omitted in that chart. So for example, if the user filtered for a universe of 1,000 companies, and tries to visualize a scatterplot of 2 measures for the Current period, then less than 1,000 plots will likely be displayed because not all companies may have data for both those specific measures at that period.

Hyperdimensional scatterplots

A special explanation on the "hyperdimensional" chart above: Sometimes users may want to view more dimensions than is possible with the human eye. For this, FinAgg uses a special statistical approach called Principal Component Analysis. This allows a multidimensional space to be "projected" down to 2-D for easy viewing.

How can you understand it? Consider a basic example below illustrating the 4 dimensions of crime reported in each US state. The x-axis and y-axis are the principal components (but just ignore them!), the measures of interest are the red arrows that originate from the center of the chart and sprawl out. These are your selected variables.



An example of viewing a 4D space projected down onto a 2D plane

Following the arrows in the direction where they are pointing represents a positive increase in that measure, while going in the opposite direction represents a decrease in that measure. The origin does not represent a zero value. Further, two Arrows pointing in the same direction indicate correlation (ex: Assault and Murder), while opposite would indicate negative correlation (none depicted here). Perpendicular arrows (ex: Urban Pop and Assault) indicate no correlation.

Now, looking at where the samples lie, we can see that where California is plotted (top-right) in an area of the chart where there is high Urban Populations, Rape, Assault, and Murder (ouch!). Mississippi (bottom-right) suffers from these same crimes too, but their populace resides less in urban centers. New Hampshire (mid-left) is in the area that reflects low rates of violent crime and a medium level of urban population.

After reviewing this example, we can see how this multi-dimensional charting technique allows for a dense amount of information to be visualized by the user at once. However, it is not a perfect representation. Some information is "lost" when forcing a multidimensional space onto a 2D plane (see conceptual diagram below). But how much information is lost? It depends. If viewing this type of chart, FinAgg will print the percentage of variance that it was able to capture above the chart (the rest being lost for viewing). Typically it is minor because a core part of Principal Component Analysis is to "rotate" the multidimensional space in a way that is optimal for viewing. Overall, this can be one of the most interesting ways to see and learn from data on FinAgg, so I hope you enjoy!


An example of a 3D object losing some of its information as it is projected onto a 2D plane


Financial Calendar

The "Financial Calendar" uses machine learning to make estimated date ranges for upcoming filings expected to be released. This is based on the expected upcoming filing type (i.e. Annual filings are allowed longer deadlines than Quarterly filings by the SEC) and the historic patterns of the individual company's releases (i.e. some habitually are early or late filers).

Of course, this is only a prediction. Accuracy of this model has validated at 79% (which is okay). It is just a guideline for your awareness, so please check the investor relations section of the company's website for announcements of filing releases or earnings calls.

Be aware that earnings calls and press releases are typically the first ways companies disclose financial information to the public (and may be a few days earlier than when the full financial filings are released), so do not treat the financial calendar as guidance to when financial news first breaks out to the public!

Listing/Delisting

This section shows activity of companies being listed or delisted on the stock exchanges.

Technical note, that any companies undergoing a name ticker change will temporarily appear here as “Delisted” until the SEC EDGAR system reflects those updates in their database (a couple of business days usually), after which this section will self-correct and remove the delisting.