/r/sycl

Photograph via snooOG

SYCL makes it easier for developers to write software using a C++ single-source parallel programming model. This sub is for sharing news, tutorials and having discussions about SYCL. http://sycl.tech

What is SYCL ?

SYCL is a specification which defines a single source C++ programming layer that is built on top of OpenCL, it allows developers to parallelize standard C++ code on OpenCL compatible devices like GPUs
Specifications
V1.2 :
https://www.khronos.org/registry/SYCL/specs/sycl-1.2.pdf
V2.2 :
https://www.khronos.org/registry/SYCL/specs/sycl-2.2.pdf
Implementations
V1.2 :
https://www.codeplay.com/products/computesuite/computecpp
V2.2 :
https://github.com/triSYCL/triSYCL

Related Subreddits

/r/gpgpu
/r/OpenCL
/r/cpp

/r/sycl

367 Subscribers

3

How to Get Started With SYCL

Hello, I’ve been trying to figure out how to get started with SYCL but I can’t find any resources. I’m not sure if there is an SDK I can download or something. I was hoping I could just include SYCL into my c++ project and start writing kernels for the gpu. Any help would be appreciated.

7 Comments
2024/04/02
02:43 UTC

1

Can I limit the number of cores in a host run? (Intel OneAPI)

I want to compare sycl to other parallel programming systems and for now I'm doing host runs. So I want to do a scaling study with number of cores is 1,2,5,10,20,50.

I have not found a mechanism (probably specific to Intel OneAPI) to limit the nmber of cores. That should be spossible, right? Something with tbb or OpenCL or whatever.

1 Comment
2024/03/27
18:01 UTC

3

Leverage parallelism capabilities of SYCL for faster multiarchitecture parallel programming in C++.

1 Comment
2024/03/26
20:02 UTC

4

Using 3rd party library in SYCL Code

Hello,

so I was wondering if I could use the C++ library PcapPlusPlus and it‘s header files in my SYCL Code. I am using CentOS Stream 8 and oneAPI Base Toolkit 2023.1. So I downloaded the Github repository and built the files. After placing the header files in the necessary folders, I tried to compile the code example of PcapPlusPlus with the icpx command but got a lot of „undefined reference“ errors. After some research, I can’t find anything that explicitly denies the possibility to use 3rd party libraries. Does anybody have an idea what I could be missing or is this straight up not possible to do?

6 Comments
2024/03/12
15:29 UTC

3

Utilizing SYCL in Database Engines

I’m in the process of developing a prototype for a database engine that targets multiple architectures and accelerators. Maintaining a codebase for x86_64, ARM, various GPUs, and different accelerators is quite challenging, so I’m exploring ways to execute queries on different accelerators using a unified codebase.

I’ve experimented with LLVM MLIR and attempted to lower the affine dialect to various architectures. However, the experience was less than satisfactory, as it seemed that either I was not using it correctly, or there were missing compiler passes when I was lowering it to a code targeting a specific architecture.

I’m considering whether SYCL could be a solution to this problem. Is it feasible to generate SYCL or LLVM IR from SYCL at runtime? This capability would allow me to optimize the execution workflow in my database prototype.

Finally, given the context I’ve provided, would you recommend using SYCL, or am I perhaps using the wrong tool to address this problem?
For clarity, I'd like to build it for both Windows and Linux.

5 Comments
2024/02/05
13:59 UTC

2

Cuda conversion

Sorry to spam this subreddit, if there are other places to discuss/ask for help please say so.

I found this code in a paper in CUDA, and with the help of this table. I tried to convert it to SYCL, the conversion compiles and runs, but is giving me the wrong answer.
The code is SPMV in Csr format.

__global__ void spmv_csr_vector_kernel(const int num_rows, const int *ptr,
                                       const int *indices, const float *data,
                                       const float *x, float *y) {
  __shared__ float vals[];
  int thread_id = blockDim.x * blockIdx.x + threadIdx.x; // global thread index
  int warp_id = thread_id / 32;                          // global warp index
  int lane = thread_id & (32 - 1); // thread index within the warp
  // one warp per row
  int row = warp_id;
  if (row < num_rows) {
    int row_start = ptr[row];
    int row_end = ptr[row + 1];
    // compute running sum per thread
    vals[threadIdx.x] = 0;
    for (int jj = row_start + lane; jj < row_end; jj += 32)
      vals[threadIdx.x] += data[jj] * x[indices[jj]];
    // parallel reduction in shared memory
    if (lane < 16)
      vals[threadIdx.x] += vals[threadIdx.x + 16];
    if (lane < 8)
      vals[threadIdx.x] += vals[threadIdx.x + 8];
    if (lane < 4)
      vals[threadIdx.x] += vals[threadIdx.x + 4];
    if (lane < 2)
      vals[threadIdx.x] += vals[threadIdx.x + 2];
    if (lane < 1)
      vals[threadIdx.x] += vals[threadIdx.x + 1];
    // first thread writes the result
    if (lane == 0)
      y[row] += vals[threadIdx.x];
  }
}

And here is my sycl implementation:

void SPMV_Parallel(sycl::queue q, int compute_units, int work_group_size,
                   int num_rows, int *ptr, int *indices, float *data, float *x,
                   float *y) {

  float *vals = sycl::malloc_shared<float>(work_group_size, q);
  q.fill(y, 0, n).wait();
  q.fill(vals, 0, work_group_size).wait();

  q.submit([&](sycl::handler &cgh) {
     const int WARP_SIZE = 32;

     assert(work_group_size % WARP_SIZE == 0);

     cgh.parallel_for(
         sycl::nd_range<1>(compute_units * work_group_size, work_group_size),
         [=](sycl::nd_item<1> item) {
           int thread_id = item.get_local_range(0) * item.get_group(0) *
                           item.get_local_id(0);
           int warp_id = thread_id / WARP_SIZE;
           int lane = thread_id & (WARP_SIZE - 1);
           int row = warp_id;

           if (row < num_rows) {
             int row_start = ptr[row];
             int row_end = ptr[row + 1];
             vals[item.get_local_id(0)] = 0;
             for (int jj = row_start + lane; jj < row_end; jj += WARP_SIZE) {
               vals[item.get_local_id(0)] += data[jj] * x[indices[jj]];
             }

             if (lane < 16)
               vals[item.get_local_id(0)] += vals[item.get_local_id(0) + 16];
             if (lane < 8)
               vals[item.get_local_id(0)] += vals[item.get_local_id(0) + 8];
             if (lane < 4)
               vals[item.get_local_id(0)] += vals[item.get_local_id(0) + 4];
             if (lane < 2)
               vals[item.get_local_id(0)] += vals[item.get_local_id(0) + 2];
             if (lane < 1)
               vals[item.get_local_id(0)] += vals[item.get_local_id(0) + 1];

             if (lane == 0)
               y[row] += vals[item.get_local_id(0)];
           }
         });
   }).wait();
  sycl::free(vals, q);
}

Any guidance would be greatly appreaciated !

9 Comments
2024/01/31
20:02 UTC

6

Best Ways to learn Sycl

Hi everyone,

Doing a master thesis in Heterogeneous computing and am expected to program in SYCl, the thing is I am having a hard time finding online materials to learn it.

I am aware of sycl-academy, one workshop given by EUROCC Sweden and a book (`Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL`), but it seems that examples and the classes are too simple.

I have experience in some parallel programming (OpenMp and OpenMPI) but all at CPU level, working with GPU is something completing new.

I am mostly missing (harder/more complex) exercises/examples, and having a hard time understanding `nd_range`.

Do you guys recommend anything ? How did you learn SYCL, do you use SYCL for any project ?

9 Comments
2024/01/30
18:17 UTC

5

Cuda to SYCL help

Hi need help converting the following cuda code to sycl. I am using unified shared memory, but the array y allways return 0, in all indexes.

I am genuinely lost. Any help is greatly appreciated.

global void
spmv_csr_scalar_kernel(
    const int num_rows,
    const int matrix->row_offsets,
    const intmatrix->column_indices,
    const float matrix->values,
    const floatx,
    float y)
{
    int row = blockDim.x blockIdx.x + threadIdx.x;
    if (row < num_rows)
    {
        float dot = 0;
        int row_start = matrix->row_offsets[row];
        int row_end = matrix->row_offsets[row + 1];
        for (int jj = row_start; jj < row_end; jj++)
            dot += matrix->values[jj] * x[matrix->column_indices[jj]];
        y[row] += dot;
    }
}

I have tried the following:

void SPMVV_Parallel(sycl::queue q, const CompressedSparseRow matrix, const float *x, float *y)
{
  q.parallel_for(sycl::range<1>(n), [=](sycl::id<1> gid)
                 {
    int row = gid[0];
      if (row < n) {
        float dot = 0;
        int row_start = matrix->row_offsets[row];
        int row_end = matrix->row_offsets[row+1];
        for (size_t i = row_start; i < row_end; i++)
        {
          dot+=matrix->values[i] x[matrix->column_indices[i]];
        }
        y[row]+=dot;
        } });
}
6 Comments
2024/01/10
18:17 UTC

4

SYCL goes Green with SYnergy

Biagio Cosenza from the University of Salerno / CINECA Supercomputing Center pens this blog on the SYnergy research project that enables efficient C++ based heterogeneous parallel programming with the Khronos SYCL API.

https://khr.io/12h

1 Comment
2023/12/11
16:31 UTC

1

Integrating SYCL into an existing large project

I'm looking to offload some data processing in a large existing application to the gpu. This project has a base library that does all the math, a QT app on top of the library, and a separate grpc app that acts as a web api to that library. The build system uses cmake and vcpkg to pull in dependencies.

Is there a way to integrate any of the SYCL implementations into a project like this? Writing a SYCL project from scratch is easy, but I can't find any good information on how to add it or if it's even possible to use SYCL with a pretty standard cmake/vcpkg project. It's definitely not as easy as changing the compiler and rebuilding.

In the past, I've compiled opencl down to spir or used cuda. Both of those are the easy way to go, but I'm trying to look towards the future where I can.

7 Comments
2023/11/14
02:30 UTC

3

How to debug SYCL program running on GPU?

I'm a beginner and I need to debug SYCL program running on GPU(Nvidia). How should I move forward and what tools should I use? Do I need to PoCL for this?

2 Comments
2023/11/09
09:12 UTC

3

Any hope for a fully portable, compiler agnostic implementation ?

Hello everyone. I was looking into the library-only compilation flow of OpenSycl. From what I read, it seams it tries to support every compiler and every OS. But it actually doesn't support many backends.

Is there a project / a hope that using syscl may be as portable as graphics APIs (eg : include and link the lib, build using any library, run anywhere by lowering at runtime) ?

Or would this require new language tooling such as reflection ?

9 Comments
2023/09/09
22:11 UTC

5

SYCL-implementation for Windows, supporting nVidia/AMD GPUs?

Is there actually any out-the-box SYCL-implementation or plugins for any of existing SYCL-implementations for Windows, supporting nVidia and AMD GPUs as a compute devices?

There is a lot of discussions in the internet, including the posts in this sub, for example, "Learn SYCL or CUDA?", where one of the popular answers was: Cuda is nVidia-only, and SYCL is universal.

But the thing is that I can't compute on my nVidia GPU using SYCL in Windows. I installed DPCPP, and really liked the concept of SYCL, but all what I can get is a mediocre performant CPU-code (ISPC-based solutions are up to twice as fast in my tests), and GPU-code for Intel GPU, which is ran on my integrated Intel GPU even slower than the CPU-variant (and default device selector prefers integrated GPU, hm). I googled other implementations, and some of them provide nVidia/AMD support, but only for Linux.

Am I missing something?

18 Comments
2023/08/28
08:45 UTC

1

Allocate struct on device. Please help

Hiya I'm pretty new to SYCL but I want to allocate a struct and all its members to a sycl device but I keep getting errors about Illegal memory accesses in CUDA. can I have some help please or an alternative suggestion

This is my code. I create a struct, allocate it to the device as well as an int array, populate the int array and then print it out.

#include <sycl/sycl.hpp>
 struct test_struct {
    int* data = nullptr;
  };
int test(test_struct **t){
  try {
      sycl::queue q;
      *t = sycl::malloc_shared<test_struct>(1, *q);
      int* host_res = (int*) malloc(20 * sizeof(int));
      size_t size = 20;
      (*t)->data = sycl::malloc_device<int>(size, q);
      q.parallel_for(sycl::range<1>(size), [=](sycl::id<1> i) {
          (*t)->data[i] = i;
      }).wait();
      q.memcpy(host_res,(*t)->data,size * sizeof(int)).wait();
      for (size_t i = 0; i < 20; i++)
      {
          std::cout << host_res[i] << std::endl;
      }
      sycl::free((*t)->data, q);
    }
    catch (sycl::exception &e) {
        std::cout << "SYCL exception caught: " << e.what() << std::endl;
    }
  return 0;
}
int main() {
  test_struct *t;
  test(&t);
  return 0;
};

1 Comment
2023/06/26
16:58 UTC

3

oneAPI DevSummit for general topics like AI and HPC - June 13th, 2023

Hello SYCLers - wanted to let you all know that there is a oneAPI DevSummit on June 13th! We have a great State of the Union talk where you can find out the latest that is happening in the ecosystem. We have all the chat on discord. It'll be a fun way to hang out with fellow SYCLers and oneAPI enthusiasts.

Looking forward to seeing you there!

https://www.oneapi.io/events/oneapi-devsummit-2023/

Feedback of course is welcome. :-)

0 Comments
2023/06/08
19:52 UTC

1

Signal processing libraries for SYCL.

Hi,

I hope you're doing well.

I am searching for some libraries for signal processing and linear algebra for sycl. In addition to oneMKL. I am looking for other libraries that can execute in dpc++ (or hipSYCL or triSYCL).

Cheers,

0 Comments
2023/05/23
09:08 UTC

2

RFP: SYCL 2020 Reference Guide

The Khronos Group has issued a RFP for a SYCL 2020 Reference Guide. The project aims to improve the SYCL developer ecosystem by providing a more usable version of the SYCL specification. An online searchable reference is needed, along the lines of cppreference.com, through which developers can rapidly find relevant material in top ranked web searches or browsing.

Submit your bid by Monday, June 12, 2023!

https://members.khronos.org/document/dl/30206

0 Comments
2023/05/19
14:58 UTC

10

IWOCL & SYCLcon 2023 Video and Presentations

Videos and presentations from the talks and panels presented at last month's IWOCL & SYCLcon 2023 are now available!

https://www.iwocl.org/iwocl-2023/conference-program/

0 Comments
2023/05/02
23:53 UTC

2

device::aspects ?

The intel compiler reports that `sycl::info::platform::extensions` is deprecated, but its replacement:

Compiling: icpx -g   -std=c++17 -fsycl -O2 -g      -c devices.cxx
with icpx=/scratch1/projects/compilers/oneapi_2023.1.0/compiler/2023.1.0/linux/bin/icpx
devices.cxx:39:41: error: no member named 'aspects' in namespace 'sycl::info::device'
      plat.get_info<sycl::info::device::aspects>();
                    ~~~~~~~~~~~~~~~~~~~~^

What am I missing?

5 Comments
2023/04/25
20:00 UTC

1

Why hipsycl has made this choice ?

Hi,
I am trying to understand the runtime of hipsycl. More than that, I am trying to understand the reason behind some choices, such as having a runtime library that dispatches device code to backend runtimes instead of having a queue for each backend runtime. I saw a keynote on youtube presented by Mr. Aksel Alpay. He states that this choice is taken to improve performence. But I didn't get the idea yet :D.
My question is: Why the choice of having a hipsycl runtime between queues and backend's runtime was made ?
Thank you

6 Comments
2023/04/22
08:47 UTC

7

SYCL 2020 Revision 7 Released

Just announced at IWOCL / SYCLcon, the Khronos Group has released SYCL 2020 Revision 7.

See what changes were made: https://www.khronos.org/news/permalink/khronos-group-releases-sycl-2020-revision-7

0 Comments
2023/04/19
16:09 UTC

1

In DPC++ ( Intel implementation of sycl ) does the work items within a work group execute in parallel? Inbox

Hello everyone

I am currently working on a project using the sycl standard of khronos group. Before starting to write some code, I am reading about the dpc++ intel language to implement sycl standard.Unfortunately, I don't have much experience in programming in opencl ( or equivalent ). In fact, this is my first time doing parallel programming. Therefore, I have some trouble understanding some basic concepts such as the nd-range.I have understood that the nd-range is a way to group work items in work groups for performance raisons. Then, I asked this question: How are work groups executed ? and how work items within work groups are executed ?I have understood that work groups are mapped to compute units ( inside a gpu for example ), so i guess that work groups could be executed in parallel, from a hardware point of view, it is totally possible to execute work groups in parallel. At this point, another question arise here, how the work items are executed.I have answered this question like this:Based on Data Parallel C++ Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL written by James Reinders, the dpc++ runtime guarantees that work items could be executed concurrently ( which is totally different than parallel ). In addition, the mapping of work items to hardware cores ( cu ) is defined by the implementation. So, it is quite unclear how things would be executed. It really depends on the hardware. My answer was as following: The execution of work items within a work group depends on the hardware, if a compute unit ( in a gpu for example ) has enough cores to execute the work items, they would be executed in parallel, otherwise, they would be executed concurrently.Is this is right ? Is my answer is correct ? If it is not, what I am missing here ?
Thank you in advance

5 Comments
2023/04/03
06:29 UTC

2

Wanting to try SYCL on a low cost board. What are my options?

Hello, as the title says, I would like to try an implementation of SYCL on a low cost board. Right now, my eyes are set on computecpp, but I'm open to alternatives. My doubts are related to which board I could use for that, since I find it hard to find boards that support it, just by reading the specs.

Can you advise on which board(s) i could use? I'm trying to stay low cost (say max 200$ or about that range). As a side question, in general while reading a board's spec, what should I look for? Something like "OpenCL compatible"?

9 Comments
2023/03/30
05:20 UTC

1

No kernel named was found. First SYCL app

I'm trying to code my first SYCL app. Just some falling sand. The details aren't important. just if cell has sand and cell beneath is empty move the sand, else bottom left or bottom right or if no room do nothing. I don't have anything to visualize the particles yet, but that's for later.

#pragma warning (push, 0)
#include <CL/sycl.hpp>
#include <iostream>
#pragma warning (pop)

constexpr int WIDTH = 1024;
constexpr int HEIGHT = 1024;

class FallingPowder {
public:
  static int simulate(sycl::accessor<int, 2, sycl::access::mode::read_write,
                                     sycl::access::target::global_buffer>
                          grid_accessor,
                      sycl::item<2> item) {
    size_t x = item.get_id(0);
    size_t y = item.get_id(1);

    int current_cell = grid_accessor[{x, y}];
    int below_cell = grid_accessor[{x, y - 1}];
    int below_left_cell = grid_accessor[{x - 1, y - 1}];
    int below_right_cell = grid_accessor[{x + 1, y - 1}];

    // Check if the current cell has a particle and the cell below is empty.
    if (current_cell == 1) {
      if (below_cell == 0) {
        // Move the particle down.
        grid_accessor[{x, y - 1}] = 1;
        grid_accessor[{x, y}] = 0;
      } else if (below_left_cell == 0 && below_right_cell == 0) {
        // Move the particle down.
        if (rand() % 2) {
          grid_accessor[{x - 1, y - 1}] = 1;
        } else {
          grid_accessor[{x + 1, y - 1}] = 1;
        }
        grid_accessor[{x, y}] = 0;
      } else if (below_left_cell == 0) {
        grid_accessor[{x - 1, y - 1}] = 1;
        grid_accessor[{x, y}] = 0;
      } else if (below_right_cell == 0) {
        grid_accessor[{x + 1, y - 1}] = 1;
        grid_accessor[{x, y}] = 0;
      }
    }

    return grid_accessor[{x, y}];
  }
};

int main() {
  sycl::queue q(sycl::default_selector{});
  std::vector<int> grid(WIDTH * HEIGHT, 0);
  for (int x = (WIDTH / 2) - 50; x < (WIDTH / 2) + 50; x++) {
    for (int y = 0; y < 10; y++) {
      grid[x + y * WIDTH] = 1;
    }
  }

  sycl::buffer<int, 2> grid_buffer(grid.data(), sycl::range<2>(WIDTH, HEIGHT));

  for (int t = 0; t < 1000; t++) {
    q.submit([&](sycl::handler &cgh) {
      auto grid_accessor =
          grid_buffer.get_access<sycl::access::mode::read_write>(cgh);

      cgh.parallel_for<class FallingPowder>(
          sycl::range<2>(WIDTH, HEIGHT - 1), [=](sycl::item<2> item) {
            grid_accessor[item] = FallingPowder::simulate(grid_accessor, item);
          });
    });
  }

  q.wait_and_throw();

  return 0;
}

It compiles fine, but when I run it I get:

terminate called after throwing an instance of 'sycl::_V1::runtime_error' what(): No kernel named was found -46 (PI_ERROR_INVALID_KERNEL_NAME) Aborted (core dumped)

1 Comment
2023/03/27
15:14 UTC

2

New SYCL for Safety Critical Working Group announced

The Khronos Group has announced the creation of the SYCL SC Working Group to create a high-level heterogeneous computing framework for streamlining certification of safety-critical systems in automotive, avionics, medical, and industrial markets. SYCL SC will leverage the proven SYCL 2020 standard for parallel programming of diverse computing devices using standard C++17. Over the past year, the safety-critical community has gathered in the Khronos SYCL Safety-Critical Exploratory Forum to build consensus on use cases and industry requirements to catalyze and guide the design of this new open standard. The SYCL SC Working Group is open to any Khronos member, and Khronos membership is open to any company. https://khr.io/107

0 Comments
2023/03/15
19:20 UTC

1

Meson and SYCL/oneAPI meetup.com

Hi folks!

I wanted to call out a few things - 1) I just posted #2 of a multi-part blog post on how to build a container with the DPC++ compiler in an 'all open source' toolchain to compile sycl apps. Love to hear feedback. The 2nd blog post is about using meson as a build system. As luck would have it, we have a meetup with the meson community and oneAPI/HPC/AI and if the post intrigues you - you could join our meetup and ask questions.

Here are the two posts;

https://dev.to/oneapi/modern-software-development-tools-and-oneapi-part-1-40km

https://dev.to/oneapi/modern-software-development-tools-and-oneapi-part-2-4bjp

The meetup is this Friday (1/13/2023) at 11:30am PDT

https://www.meetup.com/oneapi-community-us/events/290726282/

0 Comments
2023/01/11
16:23 UTC

Back To Top