Roadmap and Improvement Proposals¶
The project introduction explains the overview and goals of DataFusion, and our development efforts largely align to that vision.
Planning EPIC
s¶
DataFusion uses GitHub
issues to track
planned work. We collect related tickets using tracking issues labeled
with [EPIC]
which contain discussion and links to more detailed items.
Epics offer a high level roadmap of what the DataFusion community is thinking about. The epics are not meant to restrict possibilities, but rather help the community see where development is headed, align our work, and inspire additional contributions.
As this project is entirely driven by volunteers, we welcome contributions for items not currently covered by epics. However, before submitting a large PR, we strongly suggest and request you start a conversation using a github issue or the dev@arrow.apache.org mailing list to make review efficient and avoid surprises.
Quarterly Roadmap¶
The DataFusion roadmap is driven by the priorities of contributors rather than any single organization or coordinating committee. We typically discuss our roadmap using GitHub issues, approximately quarterly, and invite you to join the discussion.
For more information:
Improvement Proposals¶
Discussing New Features¶
If you plan to work on a new feature that doesn’t have an existing ticket, it is a good idea to open a ticket to discuss the feature. Advanced discussion often helps avoid wasted effort by determining early if the feature is a good fit for DataFusion before too much time is invested. Discussion on a ticket can help gather feedback from the community and is likely easier to discuss than a 1000 line PR.
If you open a ticket and it doesn’t get any response, you can try @
-mentioning
recently active community members in the ticket to get their attention.
What Contributions are Good Fits?¶
DataFusion is designed to be highly extensible, and many features can be implemented as extensions without changes or additions to the core. Support for new functions, data formats, and similar functionality can be added using those extension APIs, and there are already many existing community supported extensions listed in the extensions list.
Query engines are complex pieces of software to develop and maintain. Given our limited maintenance bandwidth, we try to keep the DataFusion core as simple and focused as possible, while still satisfying the design goal of an easy to start initial experience.
With that in mind, contributions that meet the following criteria are more likely to be accepted:
Bug fixes for existing features
Test coverage for existing features
Documentation improvements / examples
Performance improvements to existing features (with benchmarks)
“Small” functional improvements to existing features (if they don’t change existing behavior)
Additional APIs for extending DataFusion’s capabilities
CI improvements
Contributions that will likely involve more discussion (see Discussing New Features above) prior to acceptance include:
Major new functionality (even if it is part of the “standard SQL”)
New functions, especially if they aren’t part of “standard SQL”
New data sources (e.g. support for Apache ORC)
Design Build vs. Big Up Front Design¶
Typically, the DataFusion community attacks large problems by solving them bit
by bit and refining a solution iteratively on the main
branch as a series of
Pull Requests. This is different from projects which front-load the effort
with a more comprehensive design process.
By “advancing the front” the community always makes tangible progress, and the strategy is especially effective in a project that relies on individual contributors who may not have the time or resources to invest in a large upfront design effort. However, this “bit by bit approach” doesn’t always succeed, and sometimes we get stuck or go down the wrong path and then change directions.
Our process necessarily results in imperfect solutions being the “state of the code” in some cases, and larger visions are not yet fully realized. However, the community is good at driving things to completion in the long run. If you see something that needs improvement or an area that is not yet fully realized, please consider submitting an issue or PR to improve it. We are always looking for more contributions.