I commented “BugBot run” in my repository, and got the following error. How should I fix it or what is the correct way to invoke this BugBot?
### 🚨 BugBot failed to run
Remote branch not found for this Pull Request. It may have been merged or deleted (requestId: serverGenReqId_638ef0bd-0446-4323-ab8b-d3fafa8a2e82).
Hi @jkaplan , could you please look at this issue? It keeps telling me that the remote branch is not found.
Remote branch not found for this Pull Request. It may have been merged or deleted (requestId: serverGenReqId_cad1b245-3a60-4497-92dd-d308804f7d3c).
Hi! Thanks for the tag. Just to confirm, is this PR open and the remote branch hasn’t been deleted? If so, is this a forked repo? We have a known bug that we’re working on around that.
Thanks for bearing with us during beta and trying BugBot!
Yes, this PR is open, and the remote branch is still there. You can check the PR here. We just want to have a test, but the bugbot did not seem to work.
main
← Fangtangtang:aie-external
opened 08:39PM - 04 Jun 25 UTC
## Description ##
This pull request adds support for user-defined external kern… els for MLIR-AIE backend.
Currently, developers were limited to the set of [predefined external kernels](https://github.com/Xilinx/mlir-aie/tree/v1.0/aie_kernels) provided in the repository, which covered only a narrow range of operations.
With this PR, users can now integrate their own custom AIE kernels written in C++ and exposed through extern "C" interfaces.
In addition, this PR also fixes a bug related to DTensor addressing patterns.
### Problems ###
Currently, complex computations on AIE cores were implemented using a limited set of [external kernels provided in the mlir-aie repository](https://github.com/Xilinx/mlir-aie/tree/v1.0/aie_kernels), which covered only a narrow range of operations and leaves room for performance improvement.
### Proposed Solutions ###
This PR introduces support for user-defined external kernels. It provides an interface for users to register custom external kernels and use them within Allo kernels.
### Examples ###
Implement an external kernel in `norm.cc`
```cpp
#include <aie_api/aie.hpp>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <type_traits>
#define NOCPP
#define EPS 1e-6f // epsilon
template <typename T_in, typename T_out, const int SEQ_LEN, const int HIDDEN>
void rms_norm_single_batch(T_in *input_tensor, T_in *weight,
T_out *output_tensor) {
constexpr int vec_factor = 16;
using vec_t = aie::vector<T_in, vec_factor>;
event0();
for (int iter = 0; iter < SEQ_LEN; iter++) {
T_in *__restrict input_ptr = input_tensor;
T_in *__restrict weight_ptr = weight;
T_out *__restrict output_ptr = output_tensor;
float square_sum = 0.0f;
const int F = HIDDEN / vec_factor;
for (int i = 0; i < F; i++) {
vec_t input_vec = aie::load_v<vec_factor>(input_ptr);
input_ptr += vec_factor;
vec_t square_vec = aie::mul(input_vec, input_vec);
square_sum += aie::reduce_add(square_vec);
}
vec_t square_sum_vec =
aie::broadcast<T_in, vec_factor>(square_sum / HIDDEN + EPS);
vec_t rms = aie::invsqrt(square_sum_vec);
input_ptr = input_tensor;
for (int i = 0; i < F; i++) {
vec_t input_vec = aie::load_v<vec_factor>(input_ptr);
input_ptr += vec_factor;
vec_t normed = aie::mul(input_vec, rms);
vec_t weight_vec = aie::load_v<vec_factor>(weight_ptr);
weight_ptr += vec_factor;
vec_t result = aie::mul(normed, weight_vec);
aie::store_v(output_ptr, result);
output_ptr += vec_factor;
}
input_tensor += HIDDEN;
output_tensor += HIDDEN;
}
event1();
}
// exposed via extern "C" interfaces
extern "C" {
void layer_norm(float A_in[4][512], float B_in[512], float C_out[4][512]) {
rms_norm_single_batch<float, float, 4, 512>(&A_in[0][0], B_in, &C_out[0][0]);
}
}
```
Register with `ExternalModule` and use in an Allo kernel.
```python
# register user-defined external kernel
norm = ExternalModule(
top="layer_norm", # Name of the top-level function defined with `extern "C"`
impl_path="norm.cc", # Path to the user-provided source file that implements the external kernel
input_idx=[0, 1], # Indices of input arguments in the argument list
output_idx=[2], # Indices of output arguments in the argument list
)
Ty = float32
M, N = seq_len, hidden_size
@df.region()
def top():
@df.kernel(mapping=[1])
def core(A: Ty[M, N] @ LyA, B: Ty[N] @ Ly, C: Ty[M, N] @ LyA):
norm(A, B, C) # use registered kernel
```
## Checklist ##
Please make sure to review and check all of these items:
- [x] PR's title starts with a category (e.g. [Bugfix], [IR], [Builder], etc)
- [x] All changes have test coverage (It would be good to provide ~2 different test cases to test the robustness of your code)
- [x] Pass the [formatting check](https://cornell-zhang.github.io/allo/developer/index.html#id1) locally
- [x] Code is well-documented
Yeah that’s a forked repo, this is a known issue. Thanks for reporting, we will fix it as soon as we can!
1 Like