Confirmed users
513
edits
Klahnakoski (talk | contribs) (fix wording) |
Klahnakoski (talk | contribs) m (fix link) |
||
| Line 113: | Line 113: | ||
* Query planner might help with optimization, but I do not believe it will help in this situation; Redshift already indexes the columns for fast filtering and aggregation, but in the case of joins you can control what node you data resides to minimize communication overhead between nodes. | * Query planner might help with optimization, but I do not believe it will help in this situation; Redshift already indexes the columns for fast filtering and aggregation, but in the case of joins you can control what node you data resides to minimize communication overhead between nodes. | ||
* SSD drives might improve query performance. | * SSD drives might improve query performance. | ||
* Other hidden “shallow optimizations” – I have the sense the number of unknowns in Redshift is still quite large to me. One simple oversight, and all my numbers are irrelevant. | * Other hidden “shallow optimizations” – I have the sense the number of unknowns in Redshift is still quite large to me. One simple oversight, and all my numbers are irrelevant. http://en.wikipedia.org/wiki/Linus%27s_Law “With enough eyeballs, all [optimizations] are shallow”]. | ||
* More nodes – I have no doubt more nodes can make the whole thing faster, but this must be balanced with cost. | * More nodes – I have no doubt more nodes can make the whole thing faster, but this must be balanced with cost. | ||
* More efficient data shape – There is an endless set of transformations you can apply to your data to get better query performance. The ActiveData philosophy is against putting effort into this time sink: Software is good enough that it should be performing this in the background given the data volume, data shape, and given the queries performed on it. | * More efficient data shape – There is an endless set of transformations you can apply to your data to get better query performance. The ActiveData philosophy is against putting effort into this time sink: Software is good enough that it should be performing this in the background given the data volume, data shape, and given the queries performed on it. | ||